且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

登录到网站上与蟒蛇

更新时间:2023-12-04 18:03:22

如果您检查发送到登录URL的原始请求(用工具的帮助,如的查尔斯代理),你会看到,它实际上是发送4个参数: wpName wpPassword wpLoginAttempt wpLoginToken 。第3是静态的,您可以随时在填补他们,第四届一个上。然而需要从登录页面的HTML解析。您将需要发布此值,您解析,除了其他3,登陆网址就能登录。

If you inspect the raw request sent to the login URL (with the help of a tool such as Charles Proxy), you will see that it is actually sending 4 parameters: wpName, wpPassword, wpLoginAttempt and wpLoginToken. The first 3 are static and you can fill them in anytime, the 4th one however needs to be parsed from the HTML of the login page. You will need to post this value you parsed, in addition to the other 3, to the login URL to be able to login.

下面是一个使用 请工作code BeautifulSoup

Here is the working code using Requests and BeautifulSoup:

import requests
from bs4 import BeautifulSoup as bs


def get_login_token(raw_resp):
    soup = bs(raw_resp.text, 'lxml')
    token = [n.get('value', '') for n in soup.find_all('input')
             if n.get('name', '') == 'wpLoginToken']
    return token[0]

payload = {
    'wpName': 'my_username',
    'wpPassword': 'my_password',
    'wpLoginAttempt': 'Log in',
    #'wpLoginToken': '',
    }

with requests.session() as s:
    resp = s.get('http://en.wikipedia.org/w/index.php?title=Special:UserLogin')
    payload['wpLoginToken'] = get_login_token(resp)

    response_post = s.post('http://en.wikipedia.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=login',
                           data=payload)
    response = s.get('http://en.wikipedia.org/wiki/Special:Watchlist')