且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Jsoup仅过滤掉一些从html到文本的标签

更新时间:2023-12-03 20:18:28

我现在无法测试,但是我想您想编写一个递归函数,该函数逐步遍历树并根据条件打印每个节点.以下是其外观的示例,但我希望您必须对其进行修改以更精确地满足您的需求.

I can't test this right now but I think you want to write a recursive function that steps through the tree and prints each node based on a condition. The following is an example of what it might look like but I expect that you will have to modify it to suit your needs more precisely.

Document doc = JSoup.parse(page_text);
recursive_print(doc.head());
recursive_print(doc.body());

...

private static Set<String> ignore = new HashSet<String>(){{
  add("table");
  ...
}};
public static void recursive_print(Element el){
   if(!ignore.contains(el.className()))
     System.out.println(el.html());
   for(Element child : el.children())
     recursive_print(child);
}