更新时间:2023-02-18 09:53:36
你想一个HTML文件的纯文本版本?如果是这样,你需要的是这样的:
Do you want a plain text version of a html file? If so, all you need is something like:
InputStream input = new FileInputStream("myfile.html");
ContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
new HtmlParser().parse(input, handler, metadata, new ParseContext());
String plainText = handler.toString();
该BodyContentHandler,当不带参数的构造函数或用字符限制创建的,将捕获的html正文的文本(只),并将其返还给您。
The BodyContentHandler, when created with no constructor arguments or with a character limit, will capture the text (only) of the body of the html and return it to you.