更新时间:2023-12-03 21:14:16
如果只需要从文档中提取文本,再加上任何<b>
或<i>
标记(根据您的示例),请考虑使用白名单类(请参见 docs ):
If you only need to extract the text from a document, plus any <b>
or <i>
tags (as per your example), consider using the Whitelist class (see docs):
String html = "<body><p class='default'> <span style='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> <b>Hello World</b> </span> <span style='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> , Testing </span> <span style='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> <i><b>Font </b></i> </span> <span style='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> Style </span> <span style='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> <i>Check</i> </span> <span style='color: #000000; font-size: 10pt; font-family: MyriadPro-Bold;'> </span> </p></body>";
Whitelist wl = Whitelist.simpleText();
wl.addTags("b", "i"); // add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);
将输出(根据您的示例):
Which will output (as per your example):
11-07 19:04:45.738: I/System.out(318): <b>Hello World</b> , Testing
11-07 19:04:45.738: I/System.out(318): <i><b>Font </b></i> Style
11-07 19:04:45.738: I/System.out(318): <i>Check</i>
更新:
Update:
ArrayList<String> elements = new ArrayList<String>();
Elements e = doc.select("span");
for (int i = 0; i < e.size(); i++) {
elements.add(e.get(i).html());
}