
且构网 - 分享程序员编程开发的那些事

在 Java 中解析没有结束标记的 XML

更新时间:2022-11-26 16:36:55

您可以通过添加所有缺失的结束标记来修复" XML.

You could "fix" the XML by adding all the missing end-tags.


Any start-tag that contains text after the tag, on the same line, could be fixed by adding an end-tag at the end of the line.

包含文本"的规则确保例如<Manager> 标签没有结束,因为它实际上是向下 3 行结束.

The rule of "contains text" ensures that e.g. the <Manager> tag doesn't get ended, since that is actually ended 3 lines down.


// Load file into memory
String xml = new String(Files.readAllBytes(Paths.get("test.xml")), StandardCharsets.UTF_8);

// Apply magic to add missing end-tags
xml = xml.replaceAll("(?m)^(\\s*)<(\\w+)>([^<]+)$", "$1<$2>$3</$2>");

// Parse then print the XML, to ensure there are no errors
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                                          .parse(new InputSource(new StringReader(xml)));
                  .transform(new DOMSource(document), new StreamResult(System.out));