且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在python脚本中将参数传递给scrapy spider

更新时间:2022-02-04 00:55:18

您需要修改 __init__() 构造函数以接受 date 参数.另外,我会使用 datetime.strptime() 解析日期字符串:

You need to modify your __init__() constructor to accept the date argument. Also, I would use datetime.strptime() to parse the date string:

from datetime import datetime

class MySpider(CrawlSpider):
    name = 'tw'
    allowed_domains = ['test.com']

    def __init__(self, *args, **kwargs):
        super(MySpider, self).__init__(*args, **kwargs) 

        date = kwargs.get('date')
        if not date:
            raise ValueError('No date given')

        dt = datetime.strptime(date, "%m-%d-%Y")
        self.start_urls = ['http://test.com/{dt.year}-{dt.month}-{dt.day}'.format(dt=dt)]

然后,您可以这样实例化蜘蛛:

Then, you would instantiate the spider this way:

spider = MySpider(date='01-01-2015')

或者,你甚至可以完全避免解析日期,首先传递一个 datetime 实例:

Or, you can even avoid parsing the date at all, passing a datetime instance in the first place:

class MySpider(CrawlSpider):
    name = 'tw'
    allowed_domains = ['test.com']

    def __init__(self, *args, **kwargs):
        super(MySpider, self).__init__(*args, **kwargs) 

        dt = kwargs.get('dt')
        if not dt:
            raise ValueError('No date given')

        self.start_urls = ['http://test.com/{dt.year}-{dt.month}-{dt.day}'.format(dt=dt)]

spider = MySpider(dt=datetime(year=2014, month=01, day=01))

而且,仅供参考,请参阅此答案作为关于如何从脚本运行 Scrapy.

And, just FYI, see this answer as a detailed example about how to run Scrapy from script.