更新时间:2023-02-20 09:17:09
您可以使用此正则表达式来匹配所需内容(为方便起见,将正则表达式放在字符串文字中):
You can use this regex to match what you want (the regex placed in a string literal for sake of convenience):
'~<a=5>(<([a-zA-Z0-9]+)[^>]*>(?1)*</\2>|[^<>]++)*</a>~'
这是上面的正则表达式的分解:
Here is a break down of the regex above:
<a=5>
(
<([a-zA-Z0-9]+)[^>]*>
(?1)*
</\2>
|
[^<>]++
)*
</a>
第一部分<([a-zA-Z0-9]+)[^>]*>(?1)*</\2>
匹配一对匹配的标记及其所有内容.假定标签名称由字符[a-zA-Z0-9]
组成.匹配结束标记</\2>
时,捕获标记的名称([a-zA-Z0-9]+)
和向后引用.
The first part <([a-zA-Z0-9]+)[^>]*>(?1)*</\2>
matches pair of matching tags and all its content. It assumes that the name of the tag consists of the characters [a-zA-Z0-9]
. The name of the tag is captured ([a-zA-Z0-9]+)
and backreference when matching the closing tag </\2>
.
第二部分[^<>]++
与标记之外的其他任何内容匹配.请注意,没有对带引号的字符串进行处理,因此根据您的输入,它可能不起作用.
The second part [^<>]++
matches whatever else outside the tags. Note that there is no handling of quoted string, so depending on your input it may not work.
然后返回到例程调用,该例程递归地调用第一个捕获组.您会注意到一个标签可以包含0个或多个其他标签或非标签内容的实例.由于正则表达式的编写方式,该属性也由最外面的<a=5>...</a>
对共享.
Then back to the routine call which recursively calls the first capturing group. You would notice that a tag can contain 0 or more instances of other tags or non-tag contents. Due to the way the regex is written, this property is also shared by the outer most <a=5>...</a>
pair.