且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 Beautifulsoup 解析网站

更新时间:2023-11-26 23:14:40

问题不是 BeautifulSoup 而是服务器,它在请求中需要更多信息才能让您访问此页面.现在它会发送 JavaScript 代码,将您重定向到登录页面.

Problem is not BeautifulSoup but server which needs more information in requests to give you access to this page. Now it sends JavaScript code which redirects you to login page.

您需要 User-Agent 标头来获取此页面.

You need User-Agent header to get this page.

您可以使用http://httpbin.org/get查看User-Agent代码>在您的浏览器中.

You can use http://httpbin.org/get to see User-Agent in your browser.

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0'}

url = "https://linkedin.com/company/1005"

r = requests.get(url, headers=headers)
print(r.text)

soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())