且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从HTML标记中删除某些属性

更新时间:2023-12-05 22:49:34

cleaner.Cleaner.__call__具有safe_attrs_only参数.设置为True时,仅保留clean.defs.safe_attrs中的属性.您可以通过更改clean.defs.safe_attrs删除任何或所有属性.只要确保完成后再将其更改即可.

cleaner.Cleaner.__call__ has a safe_attrs_only parameter. When set to True, only attributes in clean.defs.safe_attrs are preserved. You can remove any or all attributes by changing clean.defs.safe_attrs. Just be sure to change it back when you are done.

import lxml.html.clean as clean

code = '<tr id="ctl00_Content_AdManagementPreview_DetailView_divNova" class="Extended" style="display: none;">'

safe_attrs = clean.defs.safe_attrs
cleaner = clean.Cleaner(safe_attrs_only=True, safe_attrs=frozenset())
cleansed = cleaner.clean_html(code)

print(cleansed)

收益

<tr></tr>