且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

由于重复捕获组而不是捕获重复组,正则表达式不匹配

更新时间:2022-11-12 08:53:04

您正在尝试匹配重复的捕获组并获取捕获.使用 PHP PCRE 正则表达式是不可能的.

You are trying to match repeated capturing groups and get the captures. It is not possible with PHP PCRE regex.

您可以做的是确保提取所有 {...}/[...] 子字符串,从括号中修剪它们并使用简单的 [AG-][^AG]* 正则表达式,或者添加一个 \G 操作符,让你的正则表达式无法维护,但可以像原来的一样工作.

What you can do is to make sure you either extract all {...} / [...] substrings, trim them from the brackets and use a simple [A-G-][^A-G]* regex, or add a \G operator and make your regex unmaintainable but working as the original one.

解决方案 1 是

/(?:[[{]*|(?!\A)\G)\K[A-G-][^A-G\]}]*/

查看正则表达式演示.注意:此正则表达式不检查结束的 ]},但可以通过正向预测添加.

See the regex demo. Note: this regex does not check for the closing ] or }, but it can be added with a positive lookahead.

  • (?:[[{]*|(?!\A)\G) - 匹配 [{,零或多次出现,或上一次成功匹配的结束位置
  • \K - 省略目前匹配的文本
  • [A-G-] - 从 AG 和一个 -
  • 的字母
  • [^AG\]}]*- 零个或多个字符,除了 AG]}.
  • (?:[[{]*|(?!\A)\G) - matches a [ or {, zero or more occurreces, or the end location of the previous successful match
  • \K - omits the text matched so far
  • [A-G-] - letters from A to G and a -
  • [^A-G\]}]*- zero or more chars other than A to G and other than ] and }.

参见 PHP 演示.

解决方案 2 是

$re = '/(?|{([^}]*)}|\[([^]]*)])/'; 
$str = "{A''BsCb}"; 
$res = array();
preg_match_all($re, $str, $m);
foreach ($m[1] as $match) {
    preg_match_all('~[A-G-][^A-G]*~', $match, $tmp);
    $res = array_merge($tmp, $res);
}
print_r($res);

查看 PHP 演示

(?|{([^}]*)}|\[([^]]*)]) 正则表达式只匹配字符串,如 {...}[...](但不是 {...][...})并捕获括号之间的内容进入组 1(因为分支重置组 (?|...) 重置每个分支中的组 ID).然后,我们所需要的就是使用更连贯的 '~[A-G-][^A-G]*~' 正则表达式来获取我们需要的内容.

The (?|{([^}]*)}|\[([^]]*)]) regex just matches strings like {...} or [...] (but not {...] or [...}) and captures the contents between brackets into Group 1 (since the branch reset group (?|...) resets the group IDs in each branch). Then, all we need is to grab what we need with a more coherent '~[A-G-][^A-G]*~' regex.