且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何填写网站表单并在C#中检索结果?

更新时间:2023-02-15 15:19:31

你的问题是一个有点含糊不清,但听起来像你想要做的是屏幕抓取。它的基本含义是,你下载页面的HTML并解析它来获取你想要的值。



有问题的站点需要一个 POST 请求到以下网址:



http://scansite.mit.edu/cgi-bin/motifscan_seq



使用以下参数:

  motif_option:全部
protein_id:5031601
序列:DRNAYVWTLKGRTWKPTLVILRI
严格性:高
提交:提交请求

您需要做的是生成一个 POST 请求URL并传递相同的键/值对,除了使用您的值。这里有一些关于如何用C#做的文档(查看页面一半的例子):



http://msdn.microsoft.com/en-us/library/debx8sh9.aspx



当您获取HTML时,您需要解析它并找到您需要的相关部分。不幸的是,HTML中没有ID或类,所有东西都是由表格构成的,所以这可能会非常具有挑战性。这里是另一个涉及C#屏幕抓取的问题:

用C#刮屏幕HTML


I would like my program to be able to access a website that processes string input and returns some information about it. I want to input two sequences, submit them and read the result through the program. The website is the following:

http://scansite.mit.edu/motifscan_seq.phtml

If you enter say 5031601 as Protein Name and DRNAYVWTLKGRTWKPTLVILRI as Sequence, you will be redirected to the results site. This is the site I want to be able to read with my program. I have researched a lot about this but I can't seem to get any useful solution.

Can anyone please help me out?


EDIT:

I tried to create a web request with the following code (adapted from the link):

        WebRequest request = WebRequest.Create(
                                   "http://scansite.mit.edu/motifscan_seq");
        request.Method = "POST";
        string postData = @"motif_option=all&protein_id=5031601&
                           sequence=DRNAYVWTLKGRTWKPTLVILRI&
                           stringency=High&submit=Submit Request";
        byte[] byteArray = Encoding.UTF8.GetBytes(postData);
        request.ContentType = "application/x-www-form-urlencoded";
        request.ContentLength = byteArray.Length;
        Stream dataStream = request.GetRequestStream();
        dataStream.Write(byteArray, 0, byteArray.Length);
        dataStream.Close();

        using (WebResponse response = request.GetResponse())
        using (Stream resSteam = response.GetResponseStream())
        using (StreamReader sr = new StreamReader(resSteam))
            File.WriteAllText("SearchResults.html", sr.ReadToEnd());
        System.Diagnostics.Process.Start("SearchResults.html");

When I open the SearchResults.html, it contains the original form site with the protein name entered. The sequence hasn't been entered (it is a textarea, not a textbox). And it hasn't been submitted. Is there anything I'm missing or doing wrong?


Resolved the issue by sending the request to the uri that is stated in the action attribute of the form tag (http://scansite.mit.edu/cgi-bin/motifscan_seq).

Your question's a bit vague, but what it sounds like you want to do is screen scraping. What it basically means is that you download the HTML of the page and parse it to grab the values that you want.

The site in question takes a POST request to the following URL:

http://scansite.mit.edu/cgi-bin/motifscan_seq

With the following parameters:

motif_option: all
protein_id:   5031601
sequence:     DRNAYVWTLKGRTWKPTLVILRI
stringency:   High
submit:       Submit Request

What you have to do is generate a POST request to the URL and pass in the same key/value pairs, except with your values instead. Here's some documentation on how to do that with C# (look at the example halfway down the page):

http://msdn.microsoft.com/en-us/library/debx8sh9.aspx

When you get the HTML back, you will need to parse it and find the relevant parts that you need. Unfortunately, there are no IDs or classes in the HTML and everything is made from tables, so this might be quite challenging. Here is another question that covers screen scraping in C#:

Screen Scraping HTML with C#