且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

正则表达式引擎如何使用递归子模式解析正则表达式?

更新时间:2022-05-25 17:14:23

^((.)(?1)\2|.?)$

^$分别声明字符串的开头和结尾.让我们看一下两者之间的内容,这更有趣:

The ^ and $ asserts the beginning and the end of the string respectively. Let us look at the content in between, which is more interesting:

((.)(?1)\2|.?)
1------------1 // Capturing group 1
 2-2           // Capturing group 2

看一下第一部分(.)(?1)\2,我们可以看到它将尝试匹配任何字符,并在末尾匹配相同的字符(后向引用\2,指的是与(.)匹配的字符).在中间,它将对整个捕获组1进行递归匹配.请注意,存在一个隐式断言(由(.)开头匹配一个字符,而\2结尾匹配相同字符)需要字符串.至少2个字符.第一部分的目的是递归地切掉字符串的相同末端.

Look at the first part (.)(?1)\2, we can see that it will try to match any character, and that same character at the end (back reference \2, which refers to the character matched by (.)). In the middle, it will recursively match for the whole capturing group 1. Note that there is an implicit assertion (caused by (.) matching one character at the beginning and \2 matching the same character at the end) that requires the string to be at least 2 characters. The purpose of the first part is chopping the identical ends of the string, recursively.

看第二部分.?,我们可以看到它将匹配一个或0个字符.仅当字符串最初的长度为0或1时,或在递归匹配的剩余字符为0或1个字符之后,才匹配此字符串.第二部分的目的是匹配从两端切掉的空字符串或单个寂寞字符.

Look at second part .?, we can see that it will match one or 0 character. This will only be matched if the string initially has length 0 or 1, or after the leftover from the recursive match is 0 or 1 character. The purpose of the second part is to match the empty string or the single lonely character after the string is chopped from both ends.

递归匹配有效:

  • 整个字符串必须是回文传递,并由^$声明.我们无法从任何随机位置开始进行匹配.
  • 如果字符串是< = 1个字符,它将通过.
  • 如果字符串> 2个字符,则是否接受该字符串仅由第一部分决定.如果匹配的话,它将被切成2个结尾.
  • 如果匹配,则剩余部分只能被2端切碎,或者如果长度小于< = 1,则通过.
  • The whole string must be palindrome to pass, asserted by ^ and $. We cannot start matching from any random position.
  • If the string is <= 1 character, it passes.
  • If the string is > 2 characters, whether it is accepted is decided by the first part only. And it will be chopped by 2 ends if matches.
  • The leftover if matches, can only be chopped by the 2 ends, or passes if its length is <= 1.