且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用scrapy抓取多个页面?

更新时间:2023-09-18 22:19:22

参见 scrapy 请求结构,要抓取这样的链,您必须使用如下回调参数:

see scrapy Request structure, to crawl such chain you'll have to use the callback parameter like the following:

class MySpider(BaseSpider):
    ...
    # spider starts here
    def parse(self, response):
        ...
        # A, D, E are done in parallel, A -> B -> C are done serially
        yield Request(url=<A url>,
                      ...
                      callback=parseA)
        yield Request(url=<D url>,
                      ...
                      callback=parseD)
        yield Request(url=<E url>,
                      ...
                      callback=parseE)

    def parseA(self, response):
        ...
        yield Request(url=<B url>,
                      ...
                      callback=parseB)

    def parseB(self, response):
        ...
        yield Request(url=<C url>,
                      ...
                      callback=parseC)

    def parseC(self, response):
        ...

    def parseD(self, response):
        ...

    def parseE(self, response):
        ...