如何使用 Python 从 HTML 获取 href 链接?

更新时间：2022-06-13 00:03:18

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

如果你只想要以 http:// 开头的链接，你应该使用:

In case you just want links starting with http://, you should use:

soup.findAll('a', attrs={'href': re.compile("^http://")})

在带有 BS4 的 Python 3 中，它应该是:

In Python 3 with BS4 it should be:

from bs4 import BeautifulSoup
import urllib.request

html_page = urllib.request.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page, "html.parser")
for link in soup.findAll('a'):
    print(link.get('href'))

上一篇 : ：字符串的容量和大小不同下一篇 : 从c#/xpath获取属性值

如何使用 Python 从 HTML 获取 href 链接?

相关阅读

技术问答最新文章