且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在j2ee中使用换行符替换某些标签并删除其他标签

更新时间:2023-12-04 08:06:46

您应该挂钩目标文档的 OutputSettings ,请尝试以下操作:

You should hook the OutputSettings of the target Document, so try the following:

public class HtmlWithLineBreaks 
{

  public String getCleanHtml(Document document)
  {
    document.outputSettings(new Document.OutputSettings().prettyPrint(false)); //makes html() call preserve linebreaks and spacing
    return Jsoup.clean(document.html(),
        "",
        Whitelist.none(),
        new Document.OutputSettings().prettyPrint(false));
  }

  public static void main(String... args)
  {
    File input = new File("/path/to/some/input.html"); //Just replace the input with you own html file source
    Document document;
    try
    {
      document = Jsoup.parse(input, "UTF-8");
      String printOut = new HtmlWithLineBreaks().getCleanHtml(document);
      System.out.println(printOut);
    } catch (IOException e)
    {
      e.printStackTrace();
    } 
  }

}

(可选)您可以在< h1> < div> 包装后插入自定义换行符对提供的输出不满意:

Optionally you can insert custom linebreaks after your <h1> <div> wrapper if you are not satisfied with the provided output:

public String getCleanHtml(Document document)
{
  document.outputSettings(new Document.OutputSettings().prettyPrint(false));
  document.select("h1").parents().select("div").append("\n"); // Insert a linebreak after the h1 div parent.
  return Jsoup.clean(document.html(),
      "",
      Whitelist.none(),
      new Document.OutputSettings().prettyPrint(false));
}