且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

防弹SimpleXMLElement

更新时间:2023-12-05 16:18:58

您可以使用 DOM的loadHTML ,然后将结果导入到SimpleXML。

You can load the HTML with DOM's loadHTML then import the result to SimpleXML.

IIRC,它仍然会阻塞一些东西,但它会接受破碎网站现实世界中几乎所有存在的东西。

IIRC, it will still choke on some stuff but it will accept pretty much anything that exists in the real world of broken websites.

$html = '<html><head><body><div>stuff & stuff</body></html>';

// disable PHP errors
$old = libxml_use_internal_errors(true);

$dom = new DOMDocument;
$dom->loadHTML($html);

// restore the old behaviour
libxml_use_internal_errors($old);

$sxe = simplexml_import_dom($dom);
die($sxe->asXML());