更新时间:2023-01-06 23:27:19
我通常使用SgmlReader这个: https://github.com/ MindTouch的/ SGMLReader
I usually use SgmlReader for this: https://github.com/MindTouch/SGMLReader
像其他人所说的,也有在HTML不遵循XML的同时良好的规则问题,所以它本质上是很难分析,但SgmlReader通常做了pretty的好工作。
Like others have said, there are issues in that HTML doesn't follow the same well-formed rules of XML, so it is inherently difficult to parse, but SgmlReader usually does a pretty good job.