
且构网 - 分享程序员编程开发的那些事


更新时间:2023-09-16 15:15:40




The problem is not so simple; and I don't think it can be solved without looking at the site you are trying to scrape. I have no idea how exactly the lazy loading technique you described is implemented, but I'm sure it can be implemented is some different ways, and those differences would need difference scraping approaches. Only one aspect of the difference is important: in all cases, scrolling causes some additional HTTP requests, and the data related to the scrolling event (say, scrolling position, page, or something like that) can be passed in the HTTP request in different ways: HTTP parameters, URL parameters, etc.

So, you need to study this and act accordingly. How? Here is the approach I would use:

Use some existing HTTP spy software and then try to rich the full content manually, by loading the page and scrolling. Such HTTP spying tools are often available as plug-ins for Web browser. I, for example, use HttpFox, a plug-in for Mozilla browsers. If the tracking is turned on, it will list you all the HTTP requests and HTTP responses passed through the browser, with all the detail needed to understand how to do scraping.