且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从C ++调用网站

更新时间:2023-10-23 13:05:40

您好.

以我的理解,您需要一个C ++程序来检索网页浏览器在访问页面URL时所要读取的数据.

它相当简单,但绝非易事.网上有一堆代码涉及使用c/c ++下载文件(这就是您正在做的-下载文件).它不是磁盘上存在的文件-与您要访问的几乎任何论坛页面都没有什么不同,它们也都是动态生成的.

这是一些代码,可以将该页面下载到内存中,保存到磁盘上并打印到屏幕上.进行修改以满足您的需要,链接到ws2_32.lib


Hi there.

In my understanding, you want a C++ program that will retrieve the data that a web-browser would if you went to the url of your page.

It''s reasonably simple, but far from trivial. There''s a whole bunch of code around that deals with downloading a file using c/c++ (that''s what you''re doing - downloading it.) on the web. It''s just not a file that exists on disk - no different to pretty much any forum page you care to visit-they''re all dynamically generated too.

Here''s some code that will download this page into memory, save it to disk and print it to screen. Modify to suit your needs, link to ws2_32.lib


#include <windows.h>
#include <string>
#include <stdio.h>

using std::string;

HINSTANCE hInst;
WSADATA wsaData;

void mParseUrl(char *mUrl, string &serverName, string &filepath, string &filename)
{
    string::size_type n;
    string url = mUrl;

    if (url.substr(0,7) == "http://")
        url.erase(0,7);

    if (url.substr(0,8) == "https://")
        url.erase(0,8);

    n = url.find('/');
    if (n != string::npos)
    {
        serverName = url.substr(0,n);
        filepath = url.substr(n);
        n = filepath.rfind('/');
        filename = filepath.substr(n+1);
    }

    else
    {
        serverName = url;
        filepath = "/";
        filename = "";
    }
}

SOCKET connectToServer(char *szServerName, WORD portNum)
{
    struct hostent *hp;
    unsigned int addr;
    struct sockaddr_in server;
    SOCKET conn;

    conn = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    if (conn == INVALID_SOCKET)
        return NULL;

    if(inet_addr(szServerName)==INADDR_NONE)
    {
        hp=gethostbyname(szServerName);
    }
    else
    {
        addr=inet_addr(szServerName);
        hp=gethostbyaddr((char*)&addr,sizeof(addr),AF_INET);
    }

    if(hp==NULL)
    {
        closesocket(conn);
        return NULL;
    }

    server.sin_addr.s_addr=*((unsigned long*)hp->h_addr);
    server.sin_family=AF_INET;
    server.sin_port=htons(portNum);
    if(connect(conn,(struct sockaddr*)&server,sizeof(server)))
    {
        closesocket(conn);
        return NULL;
    }
    return conn;
}

int getHeaderLength(char *content)
{
    const char *srchStr1 = "\r\n\r\n", *srchStr2 = "\n\r\n\r";
    char *findPos;
    int ofset = -1;

    findPos = strstr(content, srchStr1);
    if (findPos != NULL)
    {
        ofset = findPos - content;
        ofset += strlen(srchStr1);
    }

    else
    {
        findPos = strstr(content, srchStr2);
        if (findPos != NULL)
        {
            ofset = findPos - content;
            ofset += strlen(srchStr2);
        }
    }
    return ofset;
}

char *readUrl2(char *szUrl, long &bytesReturnedOut, char **headerOut)
{
    const int bufSize = 512;
    char readBuffer[bufSize], sendBuffer[bufSize], tmpBuffer[bufSize];
    char *tmpResult=NULL, *result;
    SOCKET conn;
    string server, filepath, filename;
    long totalBytesRead, thisReadSize, headerLen;

    mParseUrl(szUrl, server, filepath, filename);

    ///////////// step 1, connect //////////////////////
    conn = connectToServer((char*)server.c_str(), 80);

    ///////////// step 2, send GET request /////////////
    sprintf(tmpBuffer, "GET %s HTTP/1.0", filepath.c_str());
    strcpy(sendBuffer, tmpBuffer);
    strcat(sendBuffer, "\r\n");
    sprintf(tmpBuffer, "Host: %s", server.c_str());
    strcat(sendBuffer, tmpBuffer);
    strcat(sendBuffer, "\r\n");
    strcat(sendBuffer, "\r\n");
    send(conn, sendBuffer, strlen(sendBuffer), 0);

//    SetWindowText(edit3Hwnd, sendBuffer);
    printf("Buffer being sent:\n%s", sendBuffer);

    ///////////// step 3 - get received bytes ////////////////
    // Receive until the peer closes the connection
    totalBytesRead = 0;
    while(1)
    {
        memset(readBuffer, 0, bufSize);
        thisReadSize = recv (conn, readBuffer, bufSize, 0);

        if ( thisReadSize <= 0 )
            break;

        tmpResult = (char*)realloc(tmpResult, thisReadSize+totalBytesRead);

        memcpy(tmpResult+totalBytesRead, readBuffer, thisReadSize);
        totalBytesRead += thisReadSize;
    }

    headerLen = getHeaderLength(tmpResult);
    long contenLen = totalBytesRead-headerLen;
    result = new char[contenLen+1];
    memcpy(result, tmpResult+headerLen, contenLen);
    result[contenLen] = 0x0;
    char *myTmp;

    myTmp = new char[headerLen+1];
    strncpy(myTmp, tmpResult, headerLen);
    myTmp[headerLen] = NULL;
    delete(tmpResult);
    *headerOut = myTmp;

    bytesReturnedOut = contenLen;
    closesocket(conn);
    return(result);
}


int main()
{
    const int bufLen = 1024;
    char *szUrl = "http://www.codeproject.com/Questions/427350/calling-a-website-from-cplusplus";
    long fileSize;
    char *memBuffer, *headerBuffer;
    FILE *fp;

    memBuffer = headerBuffer = NULL;

    if ( WSAStartup(0x101, &wsaData) != 0)
        return -1;


    memBuffer = readUrl2(szUrl, fileSize, &headerBuffer);
    printf("returned from readUrl\n");
    printf("data returned:\n%s", memBuffer);
    if (fileSize != 0)
    {
        printf("Got some data\n");
        fp = fopen("downloaded.file", "wb");
        fwrite(memBuffer, 1, fileSize, fp);
        fclose(fp);
//        SetDlgItemText(hwndDlg, IDC_EDIT4, headerBuffer);
//        SetDlgItemText(hwndDlg, IDC_EDIT5, memBuffer);
        delete(memBuffer);
        delete(headerBuffer);
    }

    WSACleanup();
    return 0;
}


CodeProject中有一篇关于此的有用文章:

CHttpClient-使用WinInet的帮助器类 [
There''s an useful article in CodeProject about that:

CHttpClient - A Helper Class Using WinInet[^]

It includes responses and file management.


ShellExecute(NULL, "open", "http://www.mysite.com", NULL, NULL, SW_SHOWNORMAL);


这将启动一个浏览器.如果要解析输出,则必须使用http客户端库,例如 cURL [ HTTP协议 [ ^ ]),然后在相同的插座.响应可以是任何东西,甚至是二进制数据,也可以由您的servlet或cgi程序从服务器内部发送.
在c ++程序的客户端,有一些库可以为您完成工作(http客户端库,如libcURL).如果不需要响应,则只需使用ShellExecute()启动具有链接的用户浏览器.


This brings up a browser. If you want to parse the output then you have to use a http client library like cURL[^] to download a response to your post or get request and then you have to parse that output.

Note: If you want to communicate with a webserver via http and you can program the server as well (cgi or servlet) then you can create some "gates" inside your webserver, some urls that give a response other then a html page (like webservices), for example your url could respond with xml or json or custom text that is easy to parse by your c++ program. If you are new to http: "Calling your webserver" means creating a tcp connection with the server, sending a HTTP request (HTTP protocol[^]), and then reading out the response on the same socket. The response can be anything, even binary data, it can be sent by your servlet or cgi program from inside the server.
There are libraries that do the job for you on the client side in your c++ program (http client libraries like libcURL). If you don''t need a response, then just start a browser for the user with the link using ShellExecute().