使用Regex匹配嵌套模式(使用PHP的递归)

更新时间：2023-02-20 09:17:09

您可以使用此正则表达式来匹配所需内容(为方便起见，将正则表达式放在字符串文字中):

You can use this regex to match what you want (the regex placed in a string literal for sake of convenience):

'~<a=5>(<([a-zA-Z0-9]+)[^>]*>(?1)*</\2>|[^<>]++)*</a>~'

这是上面的正则表达式的分解:

Here is a break down of the regex above:

<a=5>
(
  <([a-zA-Z0-9]+)[^>]*>
  (?1)*
  </\2>
  |
  [^<>]++
)*
</a>

第一部分<([a-zA-Z0-9]+)[^>]*>(?1)*</\2>匹配一对匹配的标记及其所有内容.假定标签名称由字符[a-zA-Z0-9]组成.匹配结束标记</\2>时，捕获标记的名称([a-zA-Z0-9]+)和向后引用.

The first part <([a-zA-Z0-9]+)[^>]*>(?1)*</\2> matches pair of matching tags and all its content. It assumes that the name of the tag consists of the characters [a-zA-Z0-9]. The name of the tag is captured ([a-zA-Z0-9]+) and backreference when matching the closing tag </\2>.

第二部分[^<>]++与标记之外的其他任何内容匹配.请注意，没有对带引号的字符串进行处理，因此根据您的输入，它可能不起作用.

The second part [^<>]++ matches whatever else outside the tags. Note that there is no handling of quoted string, so depending on your input it may not work.

然后返回到例程调用，该例程递归地调用第一个捕获组.您会注意到一个标签可以包含0个或多个其他标签或非标签内容的实例.由于正则表达式的编写方式，该属性也由最外面的<a=5>...</a>对共享.

Then back to the routine call which recursively calls the first capturing group. You would notice that a tag can contain 0 or more instances of other tags or non-tag contents. Due to the way the regex is written, this property is also shared by the outer most <a=5>...</a> pair.

在regex101上进行演示

上一篇 : ：logback-没有行尾定界符下一篇 : PHP Regex在自定义添加的HTML标签之间查找文本

使用Regex匹配嵌套模式(使用PHP的递归)

相关阅读

技术问答最新文章