更新时间:2022-02-04 00:55:18
您需要修改 __init__()
构造函数以接受 date
参数.另外,我会使用 datetime.strptime()
解析日期字符串:
You need to modify your __init__()
constructor to accept the date
argument. Also, I would use datetime.strptime()
to parse the date string:
from datetime import datetime
class MySpider(CrawlSpider):
name = 'tw'
allowed_domains = ['test.com']
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
date = kwargs.get('date')
if not date:
raise ValueError('No date given')
dt = datetime.strptime(date, "%m-%d-%Y")
self.start_urls = ['http://test.com/{dt.year}-{dt.month}-{dt.day}'.format(dt=dt)]
然后,您可以这样实例化蜘蛛:
Then, you would instantiate the spider this way:
spider = MySpider(date='01-01-2015')
或者,你甚至可以完全避免解析日期,首先传递一个 datetime
实例:
Or, you can even avoid parsing the date at all, passing a datetime
instance in the first place:
class MySpider(CrawlSpider):
name = 'tw'
allowed_domains = ['test.com']
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
dt = kwargs.get('dt')
if not dt:
raise ValueError('No date given')
self.start_urls = ['http://test.com/{dt.year}-{dt.month}-{dt.day}'.format(dt=dt)]
spider = MySpider(dt=datetime(year=2014, month=01, day=01))
而且,仅供参考,请参阅此答案作为关于如何从脚本运行 Scrapy.
And, just FYI, see this answer as a detailed example about how to run Scrapy from script.