且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:lxml.你需要安装解析器库吗?

更新时间:2021-09-16 18:37:13

我怀疑这与 BS 用于读取 HTML 的解析器有关.他们文档在这里,但如果你像我一样(在 OSX 上)你可能会遇到一些需要做一些工作的事情:

I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:

您会注意到,在上面的 BS4 文档页面中,他们指出默认情况下 BS4 将使用 Python 内置的 HTML 解析器.假设您使用的是 OSX,Apple 捆绑的 Python 版本是 2.7.2,这对字符格式不宽松.我遇到了同样的问题,所以我升级了我的 Python 版本来解决它.在 virtualenv 中执行此操作将最大限度地减少对其他项目的干扰.

You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

如果这样做听起来很痛苦,您可以切换到 LXML 解析器:

If doing that sounds like a pain, you can switch over to the LXML parser:

pip install lxml

然后尝试:

soup = BeautifulSoup(html, "lxml")

根据您的情况,这可能已经足够了.我发现这很烦人,足以保证升级我的 Python 版本.使用 virtualenv,您可以相当轻松地迁移您的软件包.

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily.