且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 BeautifulSoup 获取两个指定标签之间的所有文本?

更新时间:2023-12-05 21:32:10

我会避免 nextSibling,因为从你的问题,你想包括一切,直到下一个 ,不管无论是在兄弟、父元素还是子元素中.

I would avoid nextSibling, as from your question, you want to include everything up until the next <a>, regardless of whether that is in a sibling, parent or child element.

因此,我认为***的方法是找到作为下一个 元素的节点并递归循环直到那时,添加遇到的每个字符串.如果您的 HTML 与示例有很大不同,您可能需要整理以下内容,但这样的事情应该可以工作:

Therefore I think the best approach is to find the node that is the next <a> element and loop recursively until then, adding each string as encountered. You may need to tidy up the below if your HTML is vastly different from the sample, but something like this should work:

from bs4 import BeautifulSoup
#by taking the `html` variable from the question.
html = BeautifulSoup(html)
firstBigTag = html.find_all('big')[0]
nextATag = firstBigTag.find_next('a')
def loopUntilA(text, firstElement):
    text += firstElement.string
    if (firstElement.next.next == nextATag):             
        return text
    else:
        #Using double next to skip the string nodes themselves
        return loopUntilA(text, firstElement.next.next)
targetString = loopUntilA('', firstBigTag)
print targetString