更新时间:2023-02-19 17:07:45
问题是所需的数据在 Javascript 代码中.而且,您目前依赖行索引的方法非常脆弱且不可靠.
The problem is that the desired data is inside the Javascript code. And, your current approach where you rely on line indexes is quite fragile and unreliable.
想法是定位包含所需数据的 script
标签,使用 正则表达式 获取包含价格的对象/字典,在 json
模块 并获取所需的信息.
The idea is to locate the script
tag containing the desired data, use regular expressions to get to the object/dictionary containing prices, load the object into a python dictionary with the help of json
module and get the desired information.
来自 Scrapy Shell 的演示:
In [1]: import re
In [2]: import json
In [3]: pattern = re.compile(r"KBB.Vehicle.Pages.PricingOverview.Buyers.setup(.*?data: ({.*?}),W+adPriceRanges", re.MULTILINE | re.DOTALL)
In [4]: data = response.xpath("//script[contains(., 'KBB.Vehicle.Pages.PricingOverview.Buyers.setup')]/text()").re(pattern)[0]
In [5]: data = data.replace("//Workaround until we get cross domain working for Flash", "")
In [6]: data_obj = json.loads(data)
In [7]: data_obj['values']['fpp']
Out[7]: {u'price': 15569.0, u'priceMax': 17356.0, u'priceMin': 13781.0}
In [8]: data_obj['values']['retail']
Out[8]: {u'price': 16370.0, u'priceMax': 0.0, u'priceMin': 0.0}