更新时间:2023-12-03 08:05:40
好,然后在获得此标题后,您也可以在此页面中拥有另一个链接,您只需再次打开该链接并从该链接中获取信息即可这个:
Ok, then after getting this headlines you can also have another links in this page, you just again open that links and fetch information from that links it can look like this:
visited = set()
links = [....]
while links:
if link_for_fetch in visited:
continue
link_for_fetch = links.pop()
content = get_contents(link_for_fetch)
headlines += parse_headlines()
links += parse_links()
visited.add(link_for_fetch)
这只是伪代码,您可以使用任何编程语言编写.但这可能会花费很多时间来解析整个网站:(并且机器人会阻止您的IP地址
it's just pseudocode, you can write in any programming language. but this can take a lot of time for parsing whole site :( and robots can block your ip address