更新时间:2023-02-19 11:05:01
如果只想找到一个匹配项,请使用 .find()
:
Use .find()
if you want to find just one match:
nextlink = soup.find("div", {"class" : "alignright single"})
在所有匹配项中
或 loop :
for nextlink in soup.findAll("div", {"class" : "alignright single"}):
a = nextlink.find('a')
print a.get('href')
后一部分也可以表示为:
The latter part can also be expressed as:
a = nextlink.find('a', href=True)
print a['href']
其中 href = True
部分仅匹配具有 href
属性的元素,这意味着您不必使用 a.get()
,因为属性将会存在(或者,没有找到< a href ="...">
链接,也没有 a
将为 None
).
where the href=True
part only matches elements that have a href
attribute, which means that you won't have to use a.get()
because the attribute will be there (alternatively, no <a href="...">
link is found and a
will be None
).
对于您问题中给定的URL,只有一个这样的链接,因此 .find()
可能是最方便的.甚至可以只使用:
For the given URL in your question, there is only one such link, so .find()
is probably most convenient. It may even be possible to just use:
nextlink = soup.find('a', rel='next', href=True)
if nextlink is not None:
print a['href']
,无需查找周围的 div
. rel ="next"
属性足以满足您的特定需求.
with no need to find the surrounding div
. The rel="next"
attribute looks enough for your specific needs.
另一个提示:利用响应标头告诉BeautifulSoup页面使用哪种编码; urllib2
响应对象可以告诉您服务器认为HTML页面编码的字符集(如果有),
As an extra tip: make use of the response headers to tell BeautifulSoup what encoding to use for a page; the urllib2
response object can tell you what, if any, character set the server thinks the HTML page is encoded in:
response = urllib2.urlopen(url1)
soup = BeautifulSoup(response.read(), from_encoding=response.info().getparam('charset'))
所有部分的快速演示:
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> response = urllib2.urlopen('http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-the-lower-garment-should-be-hallway-between-the-shins/')
>>> soup = BeautifulSoup(response.read(), from_encoding=response.info().getparam('charset'))
>>> soup.find('a', rel='next', href=True)['href']
u'http://www.dailyhadithonline.com/2013/07/21/hadith-on-clothing-women-should-lower-their-garments-to-cover-their-feet/'