且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

PHP正则表达式非捕获非匹配组

更新时间:2022-10-16 14:43:23



这是一个评论和答案。



答案部分...我同意亚历克斯早期的答案。


  1. ()相反,c>(?:)用于避免捕获文本,通常会引用较少的后缀与你想要的或提高速度表现。


  2. 遵循(?:) - 或以下任何东西,除了 * +? } - 表示在合法比赛中可能找到或可能找不到上述项目。例如, / z34?/ 将匹配z3以及z34,但不匹配z35或z等。


评论部分...我做了可能被认为是对正在工作的正则表达式的改进:



pre>

- 首先,它避免了像0-0-2011这样的东西



- 其次,它避免像233443-4-201154564这样的东西



- 第三,它包括1-1-2022这样的东西



- 第四,它包括诸如1-1-11之类的东西



- 第五,它避免了像34-4-11这样的东西。



- 第六,它允许你捕获日,月和年,所以你可以更容易地在代码..代码中引用这些代码,例如,进一步检查(第二次捕获组2,并且是第一个捕获组29,这是闰年,否则第一个捕获组

最后,请注意,您仍然会收到不存在的日期,例如31-6-11。如果你想避免这些,请尝试:

 (?:^ | \s)(?:(? 0?[1-9] | [1-2] [0-9] | 30 | 31) - (0?[1078] | 10 | 12))|(?:( 0?[1-9] 1-2] [0-9] | 30) - (0?[469] | 11))|(?:( 0?[1-9] | [1-2] [0-9]) - (0 (2))) - ((?:20)?[0-9] [0-9])(?:\s | $)

此外,我假设日期将在之前和后面跟着一个空格(或乞讨/行尾),但是您可能需要调整(例如,允许标点符号)。



其他引用此资源的评论者可能会发现有用的:
http://rubular.com/


I'm making a date matching regex, and it's all going pretty well, I've got this so far:

"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"

It will (hopefully) match single or double digit days and months, and double or quadruple digit years in the 21st century. A few trials and errors have gotten me this far.

But, I've got two simple questions regarding these results:

  1. (?: ) what is a simple explanation for this? Apparently it's a non-matching group. But then...

  2. What is the trailing ? for? e.g. (? )?

[Edited (again) to improve formatting and fix the intro.]

This is a comment and an answer.

The answer part... I do agree with alex' earlier answer.

  1. (?: ), in contrast to ( ), is used to avoid capturing text, generally so as to have fewer back references thrown in with those you do want or to improve speed performance.

  2. The ? following the (?: ) -- or when following anything except * + ? or {} -- means that the preceding item may or may not be found within a legitimate match. Eg, /z34?/ will match z3 as well as z34 but it won't match z35 or z etc.

The comment part... I made what might considered to be improvements to the regex you were working on:

(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)

-- First, it avoids things like 0-0-2011

-- Second, it avoids things like 233443-4-201154564

-- Third, it includes things like 1-1-2022

-- Forth, it includes things like 1-1-11

-- Fifth, it avoids things like 34-4-11

-- Sixth, it allows you to capture the day, month, and year so you can refer to these more easily in code.. code that would, for example, do a further check (is the second captured group 2 and is either the first captured group 29 and this a leap year or else the first captured group is <29) in order to see if a feb 29 date qualified or not.

Finally, note that you'll still get dates that won't exist, eg, 31-6-11. If you want to avoid these, then try:

(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)

Also, I assumed the dates would be preceded and followed by a space (or beg/end of line), but you may want ot adjust that (eg, to allow punctuations).

A commenter elsewhere referenced this resource which you might find useful: http://rubular.com/