且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何去除字符串中的特定标签和特定属性?

更新时间:2022-10-22 15:08:38





  require_once'library / HTMLPurifier。 auto.php; 

$ config = HTMLPurifier_Config :: createDefault();

//这个是需要的,否则就会导致
//被认为是有害的,像input一样会被自动删除
$ config-> set('HTML.Trusted',true) ;

//这行代表只有input,p,div会被接受
$ config-> set('HTML.AllowedElements','input,p,div');

//为每个标记设置属性
$ config-> set('HTML.AllowedAttributes','input.type,input.name,p.id,div.style') ;

//更广泛的管理属性和元素的方式...查看文档
// http://htmlpurifier.org/live/configdoc/plain.html
$ def = $ config-> getHTMLDefinition(true);

$ def-> addAttribute('input','type','Enum#text');
$ def-> addAttribute('input','name','Text');

//调用...
$ purifier = new HTMLPurifier($ config);

//显示...
$ html = $ purifier-> purify($ raw_html);





  • 注意: strong>,因为您询问此代码将以白名单运行,只接受输入,p和div,并且只接受某些特定属性。


Here's the deal, I'm making a project to help teach HTML to people. Naturally, I'm afraid of that Scumbag Steve (see figure 1).

So I wanted to block ALL HTML tags, except those approved on a very specific whitelist.

Out of those approved HTML tags, I want to remove harmful attributes as well. Such as onload and onmouseover. Also, according to a whitelist.

I've thought of regex, but I'm pretty sure it's evil and not very helpful for the job.

Could anyone give me a nudge in the right direction?

Thanks in advance.


Fig 1.

require_once 'library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();

 // this one is needed cause otherwise stuff 
 // considered harmful like input's will automatically be deleted
$config->set('HTML.Trusted', true);

// this line say that only input, p, div will be accepted
$config->set('HTML.AllowedElements', 'input,p,div');

// set attributes for each tag
$config->set('HTML.AllowedAttributes', 'input.type,input.name,p.id,div.style');

// more extensive way of manage attribute and elements... see the docs
// http://htmlpurifier.org/live/configdoc/plain.html
$def = $config->getHTMLDefinition(true);

$def->addAttribute('input', 'type', 'Enum#text');
$def->addAttribute('input', 'name', 'Text');

// call...
$purifier = new HTMLPurifier($config);

// display...
$html = $purifier->purify($raw_html);

  • NOTE: as you asked this code will run as a Whitelist, only input, p and div are accepted and only certains attributes are accepted.