如何使用 BeautifulSoup 获取两个指定标签之间的所有文本?

更新时间：2023-12-05 21:32:10

我会避免 nextSibling，因为从你的问题，你想包括一切，直到下一个，不管无论是在兄弟、父元素还是子元素中.

I would avoid nextSibling, as from your question, you want to include everything up until the next <a>, regardless of whether that is in a sibling, parent or child element.

因此，我认为***的方法是找到作为下一个元素的节点并递归循环直到那时，添加遇到的每个字符串.如果您的 HTML 与示例有很大不同，您可能需要整理以下内容，但这样的事情应该可以工作:

Therefore I think the best approach is to find the node that is the next <a> element and loop recursively until then, adding each string as encountered. You may need to tidy up the below if your HTML is vastly different from the sample, but something like this should work:

from bs4 import BeautifulSoup
#by taking the `html` variable from the question.
html = BeautifulSoup(html)
firstBigTag = html.find_all('big')[0]
nextATag = firstBigTag.find_next('a')
def loopUntilA(text, firstElement):
    text += firstElement.string
    if (firstElement.next.next == nextATag):             
        return text
    else:
        #Using double next to skip the string nodes themselves
        return loopUntilA(text, firstElement.next.next)
targetString = loopUntilA('', firstBigTag)
print targetString

上一篇 : ：在自动布局子视图围绕公司的X抛出＆QUOT;未prepared的约束＆QUOT;下一篇 : 从 VSTO 加载项检测 Word 2016 中的文本更改

如何使用 BeautifulSoup 获取两个指定标签之间的所有文本?

相关阅读

推荐文章