且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Python lxml:忽略XML声明(错误)

更新时间:2022-04-10 15:30:23

我对Thunar知之甚少,但是如果它在问题中产生XML声明,那就是一个错误.错误的XML声明会使文档格式错误.

I know very little about Thunar, but if it produces the XML declaration in the question, then that is a bug. Having an incorrect XML declaration makes the document ill-formed.

XML语法为XML声明中的项目指定了一个正确的顺序. version必须排在第一位,encoding其次.请参见 http://w3.org/TR/xml/#NT-XMLDecl .

The XML grammar specifies one correct order for the items in the XML declaration. version must come first and encoding second. See http://w3.org/TR/xml/#NT-XMLDecl.

但是,通过lxml,您可以使用将recover选项设置为True的解析器实例进行解析.在这种情况下,它可以工作.错误的XML声明将被忽略.

However, with lxml you can parse using a parser instance that has the recover option set to True. It works in this case. The bad XML declaration is ignored.

from lxml import etree 

parser = etree.XMLParser(recover=True)
tree = etree.parse('uca.xml', parser)

请参见 http://lxml.de/api/lxml.etree. XMLParser-class.html