如何检查网址是在python中的网页链接或文件链接

更新时间：2023-11-25 16:04:04

  import urllib 
 import mimetypes 

 $ b def guess_type_of（link，strict = True）：
 link_type，_ = mimetypes.guess_type（link）
如果link_type为None且严格：
u = urllib.urlopen（link）
 link_type = u.headers.gettype（）＃或使用：u.info（）。gettype（）
返回link_type 
 $ p 

演示：

  links = ['http ：//***.com/q/21515098/538284'，＃这是一个html页面
'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png'，＃这是一个png文件
'http://commons.wikimedia.org/wiki/File:Typing_example.ogv'，＃这是一个html页面
'http://upload.wikimedia.org/wikipedia/commons/ e / e6 / Typing_example.ogv'＃这是一个ogv文件
] 

链接链接：
 print（guess_type_of（link））
 
输出： p> 

  text / html 
 image / x-png 
 text / html 
 application / ogg 
 
 
Suppose i have links as follows:
    http://example.com/index.html
    http://example.com/stack.zip
    http://example.com/setup.exe
    http://example.com/news/
In the above links first and fourth links are web page links and second and third are the file link.

These are only some examples of files links i.e .zip and .exe, but there may be many other files.

Is there any standard way to distinguish between file url or web page link?
Thanks in advance.
import urllib
import mimetypes


def guess_type_of(link, strict=True):
    link_type, _ = mimetypes.guess_type(link)
    if link_type is None and strict:
        u = urllib.urlopen(link)
        link_type = u.headers.gettype() # or using: u.info().gettype()
    return link_type
Demo:
links = ['http://***.com/q/21515098/538284', # It's a html page
         'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It's a png file
         'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It's a html page
         'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv'   # It's an ogv file
]

for link in links:
    print(guess_type_of(link))
Output:
text/html
image/x-png
text/html
application/ogg

上一篇 : ：检测IFRAME文档是否格式错误下一篇 : 清除命名图表的最有效方式是什么？

如何检查网址是在python中的网页链接或文件链接

相关阅读

推荐文章