且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

通过美丽的汤访问主网站页面上的所有元素

更新时间:2023-12-03 08:05:40

好,然后在获得此标题后,您也可以在此页面中拥有另一个链接,您只需再次打开该链接并从该链接中获取信息即可这个:

Ok, then after getting this headlines you can also have another links in this page, you just again open that links and fetch information from that links it can look like this:

visited = set()    
links = [....]
    while links:
         if link_for_fetch in visited:
              continue
         link_for_fetch = links.pop()
         content = get_contents(link_for_fetch)
         headlines += parse_headlines()
         links += parse_links()
         visited.add(link_for_fetch)

这只是伪代码,您可以使用任何编程语言编写.但这可能会花费很多时间来解析整个网站:(并且机器人会阻止您的IP地址

it's just pseudocode, you can write in any programming language. but this can take a lot of time for parsing whole site :( and robots can block your ip address