且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何从正则表达式子模式中排除单词?

更新时间:2022-05-25 17:14:17

您可以使用分支重置组以匹配空字符串,如果整个 $ c>情态动词后面的词,否则为概念动词:

You may use a branch reset group to match an empty string if there is not as a whole word after a modal verb, or a notional verb otherwise:

\b(I|you|he|she|it|we|they|this|that|these|those)\s+(can|should|would|could|must|want to|have to|had to|might)\s+\K(?|(?=not\b)()|([^ß\W]\w{2,15})\b)

请参见 regex演示

详细信息


  • \b -单词边界

  • (我|您|他|她|它|我们|他们|这个|那个|那些|那些)-组1中的代词之一

  • \s + -1+空格(它已经作为相邻两边的单词边界t个组)

  • (可以|应该|将|必须|想要|必须|必须|可能)-情态动词之一

  • \s + -1+空格

  • \K -匹配重置运算符

  • (?|(?= not\b)()|([^ ß\W] \w {2,15})\b)-与


    • 匹配的分支重置组(?= not\b)()-如果在整个单词的右边紧接有 not ,请捕获第3组中的空字符串

    • | -或(在此为其他)

    • ([[^ ß\W] \w {2,15})\b -将以外的任何其他字符char匹配并捕获到组3中ß,然后是2到15个带有字符边界的单词字符。

    • \b - a word boundary
    • (I|you|he|she|it|we|they|this|that|these|those) - one of the pronouns in the group 1
    • \s+ - 1+ whitespaces (it is already acting as a word boundary on both sides of the adjacent groups)
    • (can|should|would|could|must|want to|have to|had to|might) - one ofthe modal verbs
    • \s+ - 1+ whitespaces
    • \K - match reset operator
    • (?|(?=not\b)()|([^ß\W]\w{2,15})\b) - the branch reset group matching either
      • (?=not\b)() - if there is not as whole word immediately to the right, capture an empty string into Group 3
      • | - or (here, else)
      • ([^ß\W]\w{2,15})\b - match and capture into Group 3 any word char other than ß and then 2 to 15 word chars with a word boundary to follow.

      请注意,(?m)- PCRE_MULTILINE -仅在您需要 ^ 和 $ 匹配的开始和结束而不是整个字符串。由于您的模式没有此类锚点,因此(?m)是多余的。

      Note that (?m) - PCRE_MULTILINE - is only necessary if you want your ^ and $ outside of character classes match start and end of lines rather than the whole string. Since your pattern has no such anchors, (?m) is redundant.