且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何检查网址是在python中的网页链接或文件链接

更新时间:2023-11-25 16:04:04

  import urllib 
import mimetypes

$ b def guess_type_of(link,strict = True):
link_type,_ = mimetypes.guess_type(link)
如果link_type为None且严格:
u = urllib.urlopen(link)
link_type = u.headers.gettype()#或使用:u.info()。gettype()
返回link_type
$ p

演示:

  links = ['http ://***.com/q/21515098/538284',#这是一个html页面
'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png',#这是一个png文件
'http://commons.wikimedia.org/wiki/File:Typing_example.ogv',#这是一个html页面
'http://upload.wikimedia.org/wikipedia/commons/ e / e6 / Typing_example.ogv'#这是一个ogv文件
]

链接链接:
print(guess_type_of(link))

输出: p>

  text / html 
image / x-png
text / html
application / ogg


Suppose i have links as follows:

    http://example.com/index.html
    http://example.com/stack.zip
    http://example.com/setup.exe
    http://example.com/news/

In the above links first and fourth links are web page links and second and third are the file link.

These are only some examples of files links i.e .zip and .exe, but there may be many other files.

Is there any standard way to distinguish between file url or web page link? Thanks in advance.

import urllib
import mimetypes


def guess_type_of(link, strict=True):
    link_type, _ = mimetypes.guess_type(link)
    if link_type is None and strict:
        u = urllib.urlopen(link)
        link_type = u.headers.gettype() # or using: u.info().gettype()
    return link_type

Demo:

links = ['http://***.com/q/21515098/538284', # It's a html page
         'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It's a png file
         'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It's a html page
         'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv'   # It's an ogv file
]

for link in links:
    print(guess_type_of(link))

Output:

text/html
image/x-png
text/html
application/ogg