且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从python BeautifulSoup的输出中删除新行'\ n'

更新时间:2023-12-04 10:43:40

您可以执行以下操作:

breadcrum = [item.strip() for item in breadcrum if str(item)]

if str(item)将在删除换行符后消除空列表项.

The if str(item) will take care of getting rid of the empty list items after stripping the new line characters.

如果要连接字符串,请执行以下操作:

If you want to join the strings, then do:

','.join(breadcrum)

这将为您提供abc,def,ghi

编辑

尽管上面提供了所需的内容,正如线程中的其他人所指出的那样,但是使用BS提取锚文本的方式并不正确.一旦有了您感兴趣的div,就应该使用它来获取它的子项,然后获取锚点文本.为:

Although the above gives you what you want, as pointed out by others in the thread, the way you are using BS to extract anchor texts is not correct. Once you have the div of your interest, you should be using it to get it's children and then get the anchor text. As:

path = soup.find('div',attrs={'class':'path'})
anchors = path.find_all('a')
data = []
for ele in anchors:
    data.append(ele.text)

然后执行','.join(data)