更新时间:2023-11-22 23:50:58
正如你所说,scrapy 打开你的初始 url,而不是 Selenium 修改的页面.
As you said, scrapy opens your initial url, not the page modified by Selenium.
如果你想从 Selenium 获取页面,你应该使用 driver.page_source.encode('utf-8') (编码不是强制性的).您也可以将它与scrapy Selector 一起使用:
If you want to get page from Selenium, you should use driver.page_source.encode('utf-8') (encoding is not compulsory). You can also use it with scrapy Selector:
response = Selector(text=driver.page_source.encode('utf-8'))
像以前一样处理响应之后.
After it work with response as you used to.
我会尝试这样的事情(注意,我还没有测试过代码):
I would try something like this (notice, I haven't tested the code):
import scrapy
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
count = 0
class ContractSpider(scrapy.Spider):
name = "contracts"
def start_requests(self):
urls = [
'https://www.contractsfinder.service.gov.uk/Search/Results',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def __init__(self):
driver = webdriver.Firefox()
# An implicit wait tells WebDriver to poll the DOM for a certain amount of time when trying to find any element
# (or elements) not immediately available.
driver.implicitly_wait(5)
@staticmethod
def get__response(url):
self.driver.get("url")
elem2 = self.driver.find_element_by_name("open")
elem2.click()
elem = self.driver.find_element_by_name("awarded")
elem.click()
elem3 = self.driver.find_element_by_id("awarded_date")
elem3.click()
elem4 = self.driver.find_element_by_name("awarded_from")
elem4.send_keys("01/03/2018")
elem4.send_keys(Keys.RETURN)
elem5 = self.driver.find_element_by_name("awarded_to")
elem5.send_keys("16/03/2018")
elem5.send_keys(Keys.RETURN)
elem6 = self.driver.find_element_by_name("adv_search")
self.driver.execute_script("arguments[0].scrollIntoView(true);", elem6)
elem6.send_keys(Keys.RETURN)
return self.driver.page_source.encode('utf-8')
def parse(self, response):
global count
count += 1
strcount = str(count)
# Here you got response from webdriver
# you can use selectors to extract data from it
selenium_response = Selector(text=self.get_selenium_response(response.url))
...