且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在不更改 XML 的情况下用 Java 解析包含 HTML 实体的 XML 文件

更新时间:2022-10-22 21:43:42

I would use a library like Jsoup for this purpose. I tested the following below and it works. I don't know if this helps. It can be located here: http://jsoup.org/download

public static void main(String args[]){


    String html = "<?xml version="1.0" encoding="UTF-8"?><foo>" + 
                  "<bar>Some&nbsp;text &mdash; invalid!</bar></foo>";
    Document doc = Jsoup.parse(html, "", Parser.xmlParser());

    for (Element e : doc.select("bar")) {
        System.out.println(e);
    }   


}

Result:

<bar>
 Some&nbsp;text — invalid!
</bar>

Loading from a file can be found here:

http://jsoup.org/cookbook/input/load-document-from-file