更新时间:2023-01-16 08:41:13
由于您说性能是一个问题,并且您正在进行网络抓取,因此第一件事try是一个 Scrapy
框架-这是一个非常快速简便的框架使用网络抓取框架。 scrapyd
工具将允许您分发爬网-您可以在不同的服务器上运行多个 scrapyd
服务,并在每个服务器之间分配负载。请参阅:
Since you said that performance is a problem and you are doing web-scraping, first thing to try is a Scrapy
framework - it is a very fast and easy to use web-scraping framework. scrapyd
tool would allow you to distribute the crawling - you can have multiple scrapyd
services running on different servers and split the load between each. See:
还有一个 Scrapy Cloud
服务在那里:
There is also a Scrapy Cloud
service out there:
Scrapy Cloud将高效的Scrapy开发
环境与功能强大,功能齐全的生产环境桥接在一起,以
部署和运行爬网。就像Scrapy的Heroku一样,尽管在不久的将来将支持
其他技术。它运行在Scrapinghub平台的
顶部,这意味着您的项目可以根据需要按
的需求进行扩展。
Scrapy Cloud bridges the highly efficient Scrapy development environment with a robust, fully-featured production environment to deploy and run your crawls. It's like a Heroku for Scrapy, although other technologies will be supported in the near future. It runs on top of the Scrapinghub platform, which means your project can scale on demand, as needed.