且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用Selenium Python WebDriver滚动网页

更新时间:2023-01-14 08:50:04

由于在加载了最后一个关注者存储桶之后没有出现任何特殊情况,因此我将依靠这样的事实:您知道用户拥有多少个关注者,并且您知道每个向下滚动加载了多少个(我检查过-每个滚动18个).因此,您可以计算将页面向下滚动多少次.

Since there is nothing special appearing after the last followers bucket is loaded, I would rely on the fact that you know how many followers does the user have and you know how many are loaded on each scroll down (I've inspected - it is 18 per scroll). Hence, you can calculate how many times do you need to scroll the page down.

这是实现(我使用了只有53个关注者的其他用户来演示解决方案):

Here's the implementation (I've used a different user with only 53 followers to demonstrate the solution):

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

followers_per_page = 18

driver = webdriver.Chrome()  # webdriver.Firefox() in your case
driver.get("http://www.quora.com/Andrew-Delikat/followers")

# get the followers count
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]')))
followers_count = int(element.text.replace(',', ''))
print followers_count

# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
    driver.execute_script("window.scrollTo(0, 10000);")
    time.sleep(2)

此外,如果有大量关注者,您可能需要基于循环变量增加此10000 Y坐标值.

Also, you may need to increase this 10000 Y coordinate value based on the loop variable in case there is a big number of followers.