且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

获取java的网站源码

更新时间:2022-10-14 23:35:34

您可以获得低级别,只需使用套接字请求它。在java中它看起来像

  // Arg [0] =主机名
// Arg [1] =类似index.html的文件
public static void main(String [] args)抛出异常{
SSLSocketFactory factory =(SSLSocketFactory)SSLSocketFactory.getDefault();

SSLSocket sslsock =(SSLSocket)factory.createSocket(args [0],443);

SSLSession session = sslsock.getSession () ;
X509证书;
try {
cert =(X509Certificate)session.getPeerCertificates()[0];
} catch(SSLPeerUnverifiedException e){
System.err.println(session.getPeerHost()+没有提供有效的证书。);
返回;
}

//现在使用安全套接字就像常规套接字一样读取页面。
PrintWriter out = new PrintWriter(sslsock.getOutputStream());
out.write(GET+ args [1] +HTTP / 1.0 \\\\\\\ n);
out.flush();

BufferedReader in = new BufferedReader(new InputStreamReader(sslsock.getInputStream()));
字符串行;
String regExp =。*< a href = \(。*)\>。*;
模式p = Pattern.compile(regExp,Pattern.CASE_INSENSITIVE);

while((line = in.readLine())!= null){
//使用Oscar的RegEx。
Matcher m = p.matcher(line);
if(m.matches()){
System.out.println(m.group(1));
}
}

sslsock.close();
}


I would like to use java to get the source of a website (secure) and then parse that website for links that are in it. I have found how to connect to that url, but then how can i easily get just the source, preferraby as the DOM Document oso that I could easily get the info I want.

Or is there a better way to connect to https site, get the source (which I neet to do to get a table of data...its pretty simple) then those links are files i am going to download.

I wish it was FTP but these are files stored on my tivo (i want to programmatically download them to my computer(

You can get low level and just request it with a socket. In java it looks like

// Arg[0] = Hostname
// Arg[1] = File like index.html
public static void main(String[] args) throws Exception {
    SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory.getDefault();

    SSLSocket sslsock = (SSLSocket) factory.createSocket(args[0], 443);

    SSLSession session = sslsock.getSession();
    X509Certificate cert;
    try {
        cert = (X509Certificate) session.getPeerCertificates()[0];
    } catch (SSLPeerUnverifiedException e) {
        System.err.println(session.getPeerHost() + " did not present a valid cert.");
        return;
    }

    // Now use the secure socket just like a regular socket to read pages.
    PrintWriter out = new PrintWriter(sslsock.getOutputStream());
    out.write("GET " + args[1] + " HTTP/1.0\r\n\r\n");
    out.flush();

    BufferedReader in = new BufferedReader(new InputStreamReader(sslsock.getInputStream()));
    String line;
    String regExp = ".*<a href=\"(.*)\">.*";
    Pattern p = Pattern.compile( regExp, Pattern.CASE_INSENSITIVE );

    while ((line = in.readLine()) != null) {
        // Using Oscar's RegEx.
        Matcher m = p.matcher( line );  
        if( m.matches() ) {
            System.out.println( m.group(1) );
        }
    }

    sslsock.close();
}