且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 Java 中解析没有结束标记的 XML

更新时间:2022-11-26 16:36:55

您可以通过添加所有缺失的结束标记来修复" XML.

You could "fix" the XML by adding all the missing end-tags.

任何在同一行的标签后包含文本的开始标签都可以通过在行尾添加结束标签来修复.

Any start-tag that contains text after the tag, on the same line, could be fixed by adding an end-tag at the end of the line.

包含文本"的规则确保例如<Manager> 标签没有结束,因为它实际上是向下 3 行结束.

The rule of "contains text" ensures that e.g. the <Manager> tag doesn't get ended, since that is actually ended 3 lines down.

示例工作代码:

// Load file into memory
String xml = new String(Files.readAllBytes(Paths.get("test.xml")), StandardCharsets.UTF_8);

// Apply magic to add missing end-tags
xml = xml.replaceAll("(?m)^(\\s*)<(\\w+)>([^<]+)$", "$1<$2>$3</$2>");

// Parse then print the XML, to ensure there are no errors
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                                          .parse(new InputSource(new StringReader(xml)));
TransformerFactory.newInstance().newTransformer()
                  .transform(new DOMSource(document), new StreamResult(System.out));