且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Jsoup - 提取文本

更新时间:2023-12-03 20:39:46

Element.children()返回元素对象 - 元素对象。查看父类节点,您将看到可以访问任意节点的方法,而不仅仅是元素,例如 Node.childNodes()

Element.children() returns an Elements object - a list of Element objects. Looking at the parent class, Node, you'll see methods to give you access to arbitrary nodes, not just Elements, such as Node.childNodes().

public static void main(String[] args) throws IOException {
    String str = "<div>" +
            "    Some text <b>with tags</b> might go here." +
            "    <p>Also there are paragraphs</p>" +
            "    More text can go without paragraphs<br/>" +
            "</div>";

    Document doc = Jsoup.parse(str);
    Element div = doc.select("div").first();
    int i = 0;

    for (Node node : div.childNodes()) {
        i++;
        System.out.println(String.format("%d %s %s",
                i,
                node.getClass().getSimpleName(),
                node.toString()));
    }
}

结果:


1 TextNode 
 Some text 
2 Element <b>with tags</b>
3 TextNode  might go here. 
4 Element <p>Also there are paragraphs</p>
5 TextNode  More text can go without paragraphs
6 Element <br/>