Python的要求：requests.exceptions.TooManyRedirects：突破30重定向

更新时间：2022-10-14 18:02:02

亚马逊重定向您的要求 http://www.amazon.in/b?ie=UTF8&node=976419031 ，这反过来又重定向到 http://www.amazon.in/electronics/b?ie=UTF8&node=976419031 ，之后，你有进入了一个循环：

 ＆GT;＆GT;＆GT; LOC =网址
＆GT;＆GT;＆GT;可见=集（）
＆GT;＆GT;＆GT;而真正的：
... R = requests.get（LOC，allow_redirects = FALSE）
... LOC = r.headers ['位置']
......如果LOC中看到：突破
... seen.add（LOC）
...打印禄
...
http://www.amazon.in/b?ie=UTF8&node=976419031
http://www.amazon.in/electronics/b?ie=UTF8&node=976419031
＆GT;＆GT;＆GT;禄
http://www.amazon.in/b?ie=UTF8&node=976419031

所以你的原始URL重定向一个没有新的URL B，其重定向到C，它重定向到B，等等。

显然，亚马逊并此基础上User-Agent头。以下工作：

 ＆GT;＆GT;＆GT; S = requests.Session（）
＆GT;＆GT;＆GT; s.headers ['的User-Agent'] ='的Mozilla / 5.0（Macintosh上，英特尔的Mac OS X 10_9_2）为AppleWebKit / 537.36（KHTML，像壁虎）的Chrome / Safari浏览器34.0.1847.131 / 537.36'
＆GT;＆GT;＆GT; R = s.get（URL）
＆GT;＆GT;＆GT; -  [R
＆所述;响应[200]≥

这创建一个会话（为便于再利用和Cookie持久性），以及Chrome用户代理字符串的副本。请求成功（返回一个200响应）。

I was trying to crawl this page using python-requests library

import requests
from lxml import etree,html

url = 'http://www.amazon.in/b/ref=sa_menu_mobile_elec_all?ie=UTF8&node=976419031'
r = requests.get(url)
tree = etree.HTML(r.text)
print tree

but I got above error. (TooManyRedirects) I tried to use allow_redirects parameter but same error

r = requests.get(url, allow_redirects=True)

I even tried to send headers and data alongwith url but I'm not sure if this is correct way to do it.

headers = {'content-type': 'text/html'}
payload = {'ie':'UTF8','node':'976419031'}
r = requests.post(url,data=payload,headers=headers,allow_redirects=True)

how to resolve this error. I've even tried beautiful-soup4 out of curiosity and I got different but same kind of error

page = BeautifulSoup(urllib2.urlopen(url))

urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Moved Permanently

Amazon is redirecting your request to http://www.amazon.in/b?ie=UTF8&node=976419031, which in turn redirects to http://www.amazon.in/electronics/b?ie=UTF8&node=976419031, after which you have entered a loop:

>>> loc = url
>>> seen = set()
>>> while True:
...     r = requests.get(loc, allow_redirects=False)
...     loc = r.headers['location']
...     if loc in seen: break
...     seen.add(loc)
...     print loc
... 
http://www.amazon.in/b?ie=UTF8&node=976419031
http://www.amazon.in/electronics/b?ie=UTF8&node=976419031
>>> loc
http://www.amazon.in/b?ie=UTF8&node=976419031

So your original URL A redirects no a new URL B, which redirects to C, which redirects to B, etc.

Apparently Amazon does this based on the User-Agent header. The following works:

>>> s = requests.Session()
>>> s.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36'
>>> r = s.get(url)
>>> r
<Response [200]>

This created a session (for ease of re-use and for cookie persistence), and a copy of the Chrome user agent string. The request succeeds (returns a 200 response).

上一篇 : ：JSONDe codeError：期待值：1行1列（CHAR 0）下一篇 : 在Fortran睡觉

Python的要求：requests.exceptions.TooManyRedirects：突破30重定向

相关阅读

技术问答最新文章