且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

htmlagilitypack和动态内容的问题

更新时间:2023-12-05 18:20:28

我只花了几个小时试图让HtmlAgilityPack呈现从网页一些Ajax动态内容和我从一个无用的职位到另一个去,直到我发现这一点。

I just spent hours trying to get HtmlAgilityPack to render some ajax dynamic content from a webpage and I was going from one useless post to another until I found this one.

答案就藏在下面的初始信息的评论中,我想我应该澄清一下。

The answer is hidden in a comment under the initial post and I thought I should straighten it out.

这是我最初使用和没有工作的方法:

This is the method that I used initially and didn't work:

private void LoadTraditionalWay(String url)
{
    WebRequest myWebRequest = WebRequest.Create(url);
    WebResponse myWebResponse = myWebRequest.GetResponse();
    Stream ReceiveStream = myWebResponse.GetResponseStream();
    Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
    TextReader reader = new StreamReader(ReceiveStream, encode);
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.Load(reader);
    reader.Close();
}



WebRequest的,不会令或执行渲染丢失的内容阿贾克斯查询。

WebRequest will not render or execute the ajax queries that render the missing content.

这是工作的解决方案:

private void LoadHtmlWithBrowser(String url)
{
    webBrowser1.ScriptErrorsSuppressed = true;
    webBrowser1.Navigate(url);

    waitTillLoad(this.webBrowser1);

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser1.Document.DomDocument; 
    StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML); 
    doc.Load(sr);
}

private void waitTillLoad(WebBrowser webBrControl)
{
    WebBrowserReadyState loadStatus;
    int waittime = 100000;
    int counter = 0;
    while (true)
    {
        loadStatus = webBrControl.ReadyState;
        Application.DoEvents();
        if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
        {
            break;
        }
        counter++;
    }

    counter = 0;
    while (true)
    {
        loadStatus = webBrControl.ReadyState;
        Application.DoEvents();
        if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
        {
            break;
        }
        counter++;
    }
}



我们的想法是使用WebBrowser它能够加载渲染ajax的内容,然后等到页面然后使用Microsoft.mshtml库来重新解析HTML到敏捷包之前完全呈现。

The idea is to load using the WebBrowser which is capable of rendering the ajax content and then wait till the page has fully rendered before then using the Microsoft.mshtml library to re-parse the HTML into the agility pack.

这是唯一的办法我能得到访问动态数据。

This was the only way I could get access to the dynamic data.

希望它可以帮助别人