更新时间:2022-12-13 11:43:34
我使用BeautifulSoup 4.3.2和OS X 10.6.8。我也有安装不正确 LXML
的一个问题。这里有一些事情,我发现了:
首先,检查此相关的问题:删除MacPorts的,现在的Python坏了
现在,为了检查所安装的建设者BeautifulSoup 4,尽量
>>>进口BS4
>>> bs4.builder.builder_registry.builders
如果您没有看到自己喜欢的建设者,那么它没有安装,你会看到一个错误如上(找不到一棵树建设者...)。
此外,只是因为你可以进口LXML
,并不意味着一切都是完美的。
尝试
>>>进口LXML
>>>进口lxml.etree
要明白这是怎么回事,转到 BS4
安装和打开的鸡蛋(焦油-xvzf
)。注意模块 bs4.builder
。里面你应该看到的文件,如 _lxml.py
和 _html5lib.py
。所以,你也可以尝试
>>>进口bs4.builder.htmlparser
>>>进口bs4.builder._lxml
>>>进口bs4.builder._html5lib
如果有问题,你会看到,为什么展所长,模块不能加载。你可以看到在如何建设者/ __ __初始化PY年底
加载所有这些模块,并忽略任何未加载:
#建设者登记在优先顺序相反,从而使定制
#Builder的注册将于precedence。在一般情况下,我们希望LXML
#取precedence超过html5lib,因为它的速度更快。我们只
#要使用的HTMLParser作为最后的结果。
从。进口_htmlparser
register_treebuilders_from(_htmlparser)
尝试:
从。进口_html5lib
register_treebuilders_from(_html5lib)
除了导入错误:
#他们没有安装html5lib。
通过
尝试:
从。进口_lxml
register_treebuilders_from(_lxml)
除了导入错误:
#他们没有安装LXML。
通过
I am using python 2,7.5 on mac 10.7.5, beautifulsoup 4.2.1. I am going to parse a xml page using the lxml library, as taught in the beautifulsoup tutorial. However, when I run my code, it shows
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested:
lxml,xml. Do you need to install a parser library?
I am sure that I already installed lxml by all methods: easy_install, pip, port, etc. I tried to add a line to my code to see if lxml is installed or not:
import lxml
Then python can just successfully go through this code and display the previous error message again, occurring at the same line.
So I am quite sure that lxml was installed, but not installed correctly. So I decided to uninstall lxml, and then re-install using a 'correct' method. But when I type in
easy_install -m lxml
it shows:
Searching for lxml
Best match: lxml 3.2.1
Processing lxml-3.2.1-py2.7-macosx-10.6-intel.egg
Using /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml-
3.2.1-py2.7-macosx-10.6-intel.egg
Because this distribution was installed --multi-version, before you can
import modules from this package in an application, you will need to
'import pkg_resources' and then use a 'require()' call similar to one of
these examples, in order to select the desired version:
pkg_resources.require("lxml") # latest installed version
pkg_resources.require("lxml==3.2.1") # this exact version
pkg_resources.require("lxml>=3.2.1") # this version or higher
Processing dependencies for lxml
Finished processing dependencies for lxml
So I don't know how to continue my uninstall...
I looked up many posts about this issue on google but still I can't find any useful info.
Here is my code:
import mechanize
from bs4 import BeautifulSoup
import lxml
class count:
def __init__(self,protein):
self.proteinCode = protein
self.br = mechanize.Browser()
def first_search(self):
#Test 0
soup = BeautifulSoup(self.br.open("http://www.ncbi.nlm.nih.gov/protein/21225921?report=genbank&log$=prottop&blast_rank=1&RID=YGJHMSET015"), ['lxml','xml'])
return
if __name__=='__main__':
proteinCode = sys.argv[1]
gogogo = count(proteinCode)
I want to know:
I am using BeautifulSoup 4.3.2 and OS X 10.6.8. I also have a problem with improperly installed lxml
. Here are some things that I found out:
First of all, check this related question: Removed MacPorts, now Python is broken
Now, in order to check which builders for BeautifulSoup 4 are installed, try
>>> import bs4
>>> bs4.builder.builder_registry.builders
If you don't see your favorite builder, then it is not installed, and you will see an error as above ("Couldn't find a tree builder...").
Also, just because you can import lxml
, doesn't mean that everything is perfect.
Try
>>> import lxml
>>> import lxml.etree
To understand what's going on, go to the bs4
installation and open the egg (tar -xvzf
). Notice the modules bs4.builder
. Inside it you should see files such as _lxml.py
and _html5lib.py
. So you can also try
>>> import bs4.builder.htmlparser
>>> import bs4.builder._lxml
>>> import bs4.builder._html5lib
If there is a problem, you will see, why a parricular module cannot be loaded. You can notice how at the end of builder/__init__.py
it loads all those modules and ignores whatever was not loaded:
# Builders are registered in reverse order of priority, so that custom
# builder registrations will take precedence. In general, we want lxml
# to take precedence over html5lib, because it's faster. And we only
# want to use HTMLParser as a last result.
from . import _htmlparser
register_treebuilders_from(_htmlparser)
try:
from . import _html5lib
register_treebuilders_from(_html5lib)
except ImportError:
# They don't have html5lib installed.
pass
try:
from . import _lxml
register_treebuilders_from(_lxml)
except ImportError:
# They don't have lxml installed.
pass