更新时间:2023-12-04 08:06:46
您应该挂钩目标文档的
,请尝试以下操作: OutputSettings
You should hook the OutputSettings
of the target Document
, so try the following:
public class HtmlWithLineBreaks
{
public String getCleanHtml(Document document)
{
document.outputSettings(new Document.OutputSettings().prettyPrint(false)); //makes html() call preserve linebreaks and spacing
return Jsoup.clean(document.html(),
"",
Whitelist.none(),
new Document.OutputSettings().prettyPrint(false));
}
public static void main(String... args)
{
File input = new File("/path/to/some/input.html"); //Just replace the input with you own html file source
Document document;
try
{
document = Jsoup.parse(input, "UTF-8");
String printOut = new HtmlWithLineBreaks().getCleanHtml(document);
System.out.println(printOut);
} catch (IOException e)
{
e.printStackTrace();
}
}
}
(可选)您可以在< h1>
< div>
包装后插入自定义换行符对提供的输出不满意:
Optionally you can insert custom linebreaks after your <h1>
<div>
wrapper if you are not satisfied with the provided output:
public String getCleanHtml(Document document)
{
document.outputSettings(new Document.OutputSettings().prettyPrint(false));
document.select("h1").parents().select("div").append("\n"); // Insert a linebreak after the h1 div parent.
return Jsoup.clean(document.html(),
"",
Whitelist.none(),
new Document.OutputSettings().prettyPrint(false));
}