更新时间:2023-02-22 10:36:57
我可以通过使用几乎所有在Google Chrome浏览器网络控制台中看到的标题下载我要找的zip文件。我的头像这样:
{'Connection':'keep-alive','Cache-Control' age = 0','Referer':'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293','Origin':'http://www.transtats.bts.gov',' Upgrade-Insecure-Requests':1,'Accept':'text / html,application / xhtml + xml,application / xml; q = 0.9,image / webp,* / *; q = 0.8' :'Mozilla / 5.0(Windows NT 6.1; WOW64)AppleWebKit / 537.36(KHTML,像Gecko)Chrome / 49.0.2623.112 Safari / 537.36','Cookie':'ASPSESSIONIDQADBBRTA = CMKGLHMDDJIECMNGLMDPOKHC','Accept-Language':'en- US,en; q = 0.8','Accept-Encoding':'gzip,deflate','Content-Type':'application / x-www-form-urlencoded'}
$ p>然后我写了:
res = requests.post(url,data = form_data,headers = headers)
form_data 是从Chrome控制台的表单数据部分复制而来的。一旦我得到了请求,我使用
zipfile
和io
模块来解析存储在res
。像这样:import zipfile,io
zipfile.ZipFile(io.BytesIO(res.content)),然后该文件位于我运行Python代码的目录中。
I'm looking to write a script that can automatically download
.zip
files from the Bureau of Transportation Statistics Carrier Website, but I'm having trouble getting the same response headers as I can see in Chrome when I download the zip file. I'm looking to get a response header that looks like this:HTTP/1.1 302 Object moved Cache-Control: private Content-Length: 183 Content-Type: text/html Location: http://tsdata.bts.gov/103627300_T_T100_SEGMENT_ALL_CARRIER.zip Server: Microsoft-IIS/8.5 X-Powered-By: ASP.NET Date: Thu, 21 Apr 2016 15:56:31 GMT
However, when calling
requests.post(url, data=params, headers=headers)
with the same information that I can see in the Chrome network inspector I am getting the following response:>>> res.headers {'Cache-Control': 'private', 'Content-Length': '262', 'Content-Type': 'text/html', 'X-Powered-By': 'ASP.NET', 'Date': 'Thu, 21 Apr 2016 20:16:26 GMT', 'Server': 'Microsoft-IIS/8.5'}
It's got pretty much everything except it's missing the
Location
key that I need in order to download the.zip
file with all of the data I want. Also theContent-Length
value is different, but I'm not sure if that's an issue.I think that my issue has something to do with the fact that when you click "Download" on the page it actually sends two requests that I can see in the Chrome network console. The first request is a
POST
request that yields anHTTP
response of 302 and then has theLocation
in the response header. The second request is aGET
request to the url specified in theLocation
value of the response header.Should I really be sending two requests here? Why am I not getting the same response headers using
requests
as I do in the browser? FWIW I usedcurl -X POST -d /*my data*/
and got back this in my terminal:<head><title>Object moved</title></head> <body><h1>Object Moved</h1>This object may be found <a HREF="http://tsdata.bts.gov/103714760_T_T100_SEGMENT_ALL_CARRIER.zip">here</a>.</body>
Really appreciate any help!
I was able to download the zip file that I was looking for by using almost all of the headers that I could see in the Google Chrome web console. My headers looked like this:
{'Connection': 'keep-alive', 'Cache-Control': 'max-age=0', 'Referer': 'http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=293', 'Origin': 'http://www.transtats.bts.gov', 'Upgrade-Insecure-Requests': 1, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36', 'Cookie': 'ASPSESSIONIDQADBBRTA=CMKGLHMDDJIECMNGLMDPOKHC', 'Accept-Language': 'en-US,en;q=0.8', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'application/x-www-form-urlencoded'}
And then I just wrote:
res = requests.post(url, data=form_data, headers=headers)
where
form_data
was copied from the "Form Data" section of the Chrome console. Once I got that request, I used thezipfile
andio
modules to parse the content of the response stored inres
. Like this:import zipfile, io zipfile.ZipFile(io.BytesIO(res.content))
and then the file was in the directory where I ran the Python code.
Thanks to the users who answered on this thread.