且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用美丽的汤从网页中的链接中抓取数据.Python

更新时间:2022-12-23 17:49:47

你可以这样做,我没有完全测试完整的代码,因为它会花费很多时间,可能需要长达 10 分钟,但我已经测试了部分部分并且是对我来说工作得很好.但如果不起作用,请在评论中问我.代码如下:

You can do this, I hadn't exactly tested complete code because it will take very much time it may take upto 10mins but I had tested part part and is working perfectly fine for me. But if not working ask me in comment. Here's code:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

ids=[]
avgc=[]
avgl=[]
for i in range(1,101):
    url = f'https://starngage.com/app/global/influencer/ranking/india?page={i}'
    print(url)
    resp = requests.get(url)
    
    soup = BeautifulSoup(resp.text, 'lxml')
    
    table = soup.find('table', class_='table-responsive-sm')
    trs = table.findAll('tr')
    
    for tr in trs[1:]:
        temp = tr.select_one("td:nth-of-type(3)").text
        _,insta_id = temp.split('@')
        ids.append(insta_id.strip())

for id in ids:
    page=requests.get("https://starngage.com/app/global/influencers/"+id)
    soup=BeautifulSoup(page.content, 'lxml')
    
    x=soup.find("blockquote").find("p").text.strip()
    #You can change this re code. I am not much familar with re. So, if you find any other better approch then comment.
    x=re.findall("is \d+",x)
    avl,avc=list(map(lambda y: y.replace("is ",""),x))
    avgl.append(avl)
    avgc.append(avc)

df = pd.DataFrame({"Insta Id":ids,"Avgerage Like":avgl,"Avgerage Commment":avgc})

print(df)

df.to_csv("test.csv")