且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Python lxml XPath问题

更新时间:2023-02-26 08:01:06

您的XPath显然太长了,为什么不尝试使用较短的XPath并查看它们是否匹配.一个问题可能是"tbody",浏览器会在DOM中自动创建"tbody",但HTML标记通常不包含它.

Your XPath is obviously a bit too long, why don't you try shorter ones and see if they match. One problem might be "tbody" which gets automatically created in the DOM by browsers but the HTML markup usually does not contain it.

以下是如何使用XPath结果的示例:

Here's an example of how to use XPath results:

>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = etree.parse(StringIO("<html><body>a<something/>b</body></root>"), etree.HTMLParser())
>>> doc.xpath("/html/body/text()")
['a', 'b']

因此,您可以根据需要将所有文本部分一起"".join(...).

So you could just "".join(...) all text parts together if needed.