且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

无法抓取所有数据

更新时间:2023-02-09 20:36:22

您可以像这样遍历年份和页面.

You can iterate through the year and pages like this.

import requests
import pandas as pd

url = 'https://www.vault.com/vault/api/Rankings/LoadMoreCompanyRanksJSON'

def page_loop(year, url):
    tableReturn = pd.DataFrame()
    for page in range(1,101):
        payload = {
                'rank': '2',
                'year': year,
                'category': 'LBACCompany',
                'pg': page}

        jsonData = requests.get(url, params=payload).json()

        if jsonData == []:
            return tableReturn

        else:
            print ('page: %s' %page)
            tableReturn = tableReturn.append(pd.DataFrame(jsonData), sort=True).reset_index(drop=True)

    return tableReturn




results = pd.DataFrame()
for year in range(2007,2021):
    print ("\n>>Getting page source for :" , year)

    jsonData = page_loop(year, url)
    results = results.append(pd.DataFrame(jsonData), sort=True).reset_index(drop=True)