且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何替换或删除 HTML 实体,如“ "?使用 BeautifulSoup 4

更新时间:2022-04-29 23:09:41

参见 文档中的实体.BeautifulSoup 4 为所有实体生成正确的 Unicode:

See Entities in the documentation. BeautifulSoup 4 produces proper Unicode for all entities:

传入的 HTML 或 XML 实体始终会转换为相应的 Unicode 字符.

An incoming HTML or XML entity is always converted into the corresponding Unicode character.

是的,  变成了不间断的空格字符.如果您真的希望它们成为空格字符,则必须进行 unicode 替换.

Yes,   is turned into a non-breaking space character. If you really want those to be space characters instead, you'll have to do a unicode replace.