且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

lxml.html.读取文件时出错;无法加载外部实体

更新时间:2022-11-30 15:56:13

libxml2不支持SSL/TLS.请改用Python的urllib2.

SSL/TLS is not supported by libxml2. Use Python's urllib2 instead.

如果您尝试使用任何http://<blah>.<blah>网址,则不会遇到麻烦,但此处不支持https.还有重定向问题.

If you try any url with http://<blah>.<blah> you wont have trouble but https is not supported here. There are redirection issues also.

尝试

from urllib2 import urlopen
import lxml.html
tree = lxml.html.parse(urlopen('https://google.com'))

有关更多信息,请参见

For more information refer this

解决方案

Solution

有解决方法.尝试使用selenium,如果您不想使用UI,请以无头模式运行selenium.工作正常,我自己尝试过.

Well there are workaround. Try selenium and if you dont want a UI then run selenium in headless mode. Works fine i tried it myself.