且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用jsoup刮取ajax加载的内容

更新时间:2023-12-05 08:30:04

您可以使用无头浏览器作为 PhatomJS

You can use a headless browser as PhatomJS.


PhantomJS是一个带有JavaScript API的无头WebKit脚本。它具有对各种Web标准的快速和原生支持:DOM处理,CSS选择器,JSON,Canvas和SVG。

PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

为了简化您的工作,您可以使用 CapserJS

In order to ease your work, You could use CapserJS


CasperJS是PhatomJS的配套产品,它带来了大大改进的API,可以轻松创建抓取和自动化工作流程。

CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.

当您必须使用动态内容抓取网站时,这些工具非常有用,例如,在Javascript中运行进程后显示内容的网站(有时包括ajax调用)。

These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).

你可以看到一个关于casper如何工作的例子:

带链式选择的CasperJs和Jquery

You can see a example about how casper works here:
CasperJs and Jquery with chained Selects