更新时间:2023-02-17 23:13:36
Python 不直接支持此功能,但您可以通过使用零宽度前瞻断言 ((?=RE)
),它从当前点开始匹配您想要的相同语义,将命名组 ((?P<name>RE)
) 放在前瞻中,然后使用命名的反向引用(>(?P=name)
) 以完全匹配零宽度断言匹配的任何内容.结合在一起,这将为您提供相同的语义,但代价是创建额外的匹配组和大量语法.
Python does not directly support this feature, but you can emulate it by using a zero-width lookahead assert ((?=RE)
), which matches from the current point with the same semantics you want, putting a named group ((?P<name>RE)
) inside the lookahead, and then using a named backreference ((?P=name)
) to match exactly whatever the zero-width assertion matched. Combined together, this gives you the same semantics, at the cost of creating an additional matching group, and a lot of syntax.
例如,您提供的链接给出了 Ruby 示例
For example, the link you provided gives the Ruby example of
/"(?>.*)"/.match('"Quote"') #=> nil
我们可以像这样在 Python 中模拟:
We can emulate that in Python as such:
re.search(r'"(?=(?P<tmp>.*))(?P=tmp)"', '"Quote"') # => None
我们可以证明我正在做一些有用的事情而不仅仅是喷出线路噪音,因为如果我们改变它以便内部组不吃最后的"
,它仍然匹配:
We can show that I'm doing something useful and not just spewing line noise, because if we change it so that the inner group doesn't eat the final "
, it still matches:
re.search(r'"(?=(?P<tmp>[A-Za-z]*))(?P=tmp)"', '"Quote"').groupdict()
# => {'tmp': 'Quote'}
您也可以使用匿名组和数字反向引用,但这会充满线路噪音:
You can also use anonymous groups and numeric backreferences, but this gets awfully full of line-noise:
re.search(r'"(?=(.*))\1"', '"Quote"') # => None
(完全披露:我从 perl 的 perlre
中学到了这个技巧文档,在 (?>...)
.)
(Full disclosure: I learned this trick from perl's perlre
documentation, which mentions it under the documentation for (?>...)
.)
除了具有正确的语义外,它还具有适当的性能属性.如果我们从 perlre
中移植一个例子:
In addition to having the right semantics, this also has the appropriate performance properties. If we port an example out of perlre
:
[nelhage@anarchique:~/tmp]$ cat re.py
import re
import timeit
re_1 = re.compile(r'''\(
(
[^()]+ # x+
|
\( [^()]* \)
)+
\)
''', re.X)
re_2 = re.compile(r'''\(
(
(?=(?P<tmp>[^()]+ ))(?P=tmp) # Emulate (?> x+)
|
\( [^()]* \)
)+
\)''', re.X)
print timeit.timeit("re_1.search('((()' + 'a' * 25)",
setup = "from __main__ import re_1",
number = 10)
print timeit.timeit("re_2.search('((()' + 'a' * 25)",
setup = "from __main__ import re_2",
number = 10)
我们看到了显着的改进:
We see a dramatic improvement:
[nelhage@anarchique:~/tmp]$ python re.py
96.0800571442
7.41481781006e-05
随着我们扩展搜索字符串的长度,这只会变得更加引人注目.
Which only gets more dramatic as we extend the length of the search string.