且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Python中使用LXML捕获XML文件的所有元素名称?

更新时间:2023-11-25 09:30:22

我相信您正在寻找

I believe you are looking for element.xpath().

XPath 不是lxml引入的概念,而是用于选择节点的通用查询语言从XML文档中得到许多处理XML的支持.可以将其视为类似于CSS选择器的东西,但是功能更强大(也稍微复杂一些).请参阅 XPath语法 .

XPath is not a concept introduced by lxml but a general query language for selecting nodes from an XML document supported by many things that deal with XML. Think of it as something similar to CSS selectors, but more powerful (also a bit more complicated). See XPath Syntax.

您的文档使用名称空间-我将暂时忽略该名称空间,并在文章结尾处说明如何处理它们,因为它使示例更加可读. (但它们不能按原样工作用于您的文档.)

Your document uses namespaces - I'll ignore that for now and explain at the end of the post how to deal with them, because it keeps the examples more readable that way. (But they won't work as-is for your document).

例如,

tree.xpath('/net/endAddress')

将直接在<net />节点下选择<endAddress>79.255.255.255</endAddress>元素.但<netBlock>内的<endAddress />不在.

would select the <endAddress>79.255.255.255</endAddress> element direcly below the <net /> node. But not the <endAddress /> inside the <netBlock>.

XPath表达式

tree.xpath('//endAddress')

但是会选择文档中任何位置的所有<endAddress />节点.

however would select all <endAddress /> nodes anywhere in the document.

您当然可以进一步查询使用XPath epxressions返回的节点:

You can of course further query the nodes you get back with XPath epxressions:

netblocks = tree.xpath('/net/netBlocks/netBlock')
for netblock in netblocks:
    start = netblock.xpath('./startAddress/text()')[0]
    end = netblock.xpath('./endAddress/text()')[0]
    print "%s - %s" % (start, end)

会给你

79.0.0.0 - 79.255.255.255

请注意,.xpath()始终会返回所选节点的列表-因此,如果您只想要一个,请为其说明.

Notice that .xpath() always returns a list of selected nodes - so if you want just one, account for that.

您还可以通过元素的属性来选择元素:

You can also select elements by their attributes:

comment = tree.xpath('/net/comment')[0]
line_2 = comment.xpath("./line[@number='2']")[0]

这将从第一个注释中选择带有number="2"<line />元素.

This would select the <line /> element with number="2" from the first comment.

您还可以自己选择属性:

You can also select attributes themselves:

numbers = tree.xpath('//line/attribute::number')

['0', '1', '2']

要获取您最后询问的元素名称列表,可以执行以下操作:

To get the list of element names you asked about last, you could do something likes this:

names = [node.tag for node in tree.xpath('/net/*')]

['registrationDate', 'ref', 'endAddress', 'handle', 'name', 'netBlocks', 'orgRef', 'comment', 'startAddress', 'updateDate', 'version']

但是,鉴于XPath的强大功能,***查询文档以了解您想从文档中了解什么,具体或随意查看.

But given the power of XPath, it's probably better to just query the document for what you want to know from it, as specific or loose as you see fit.

现在,名称空间.如您所注意到的,如果您的文档使用XML名称空间,则需要在许多地方考虑到这一点,XPath也不例外.查询命名空间文档时,您可以通过xpath()方法传递命名空间映射,如下所示:

Now, namespaces. As you noticed, if your document uses XML namespaces, you need to take that into consideration in many places, and XPath is no different. When querying a namespaced document, you pass the xpath() method the namespace map like this:

NSMAP = {'ns':  'http://www.arin.net/whoisrws/core/v1',
         'ns2': 'http://www.arin.net/whoisrws/rdns/v1',
         'ns3': 'http://www.arin.net/whoisrws/netref/v2'}

names = [node.tag for node in tree.xpath('/ns:net/*', namespaces=NSMAP)]

lxml的许多其他地方,您可以通过使用None作为名称空间映射中的字典键来具体化默认名称空间.不幸的是,xpath()不能,这会引发异常

In many other places in lxml you can speficy the default namespace by using None as the dictionary key in the namespace map. Not with xpath() unfortunately, that will raise an exception

TypeError: empty namespace prefix is not supported in XPath

因此,不幸的是,您必须在XPath表达式中的每个节点名称前加上ns:(或选择将名称空间映射到的任何名称).

So you unfortunately have to prefix every node name in your XPath expression with ns: (or whatever you choose to map that namespace to).

有关XPath语法的更多信息,请参见例如> XPath语法 > 页中的 W3Schools Xpath教程.

For more information on the XPath syntax, see for example the XPath Syntax page in the W3Schools Xpath Tutorial.

要开始使用XPath,在许多 XPath测试人员 .另外,用于Firefox的Firebug插件或Google Chrome检查器允许您显示所选元素的XPath(或其中的许多XPath).

To get going with XPath it can also be very helpful to fiddle around with your document in one of the many XPath testers. Also, the Firebug plugin for Firefox, or Google Chrome inspector allow you to show the (or rather, one of many) XPath for the selected element.