且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

扩展正则表达式是否支持反向引用?

更新时间:2022-11-14 18:41:16

正如其他人已经指出的那样,很明显POSIX ERE不支持反向引用.

As others have already pointed out, it's quite clear that POSIX EREs do not support back-references.

OpenGroup 基本规范第7期中给出的不向ERE添加反向引用的理由如下:

The rationale given in the OpenGroup Base Specifications Issue 7 for not adding back-references to EREs is given as:

建议除了间隔表达式外,还应将反向引用('\ n')添加到ERE中.标准开发人员拒绝了这种做法,因为这可能会降低共识.

It was suggested that, in addition to interval expressions, back-references ( '\n' ) should also be added to EREs. This was rejected by the standard developers as likely to decrease consensus.

引用自:定义:基本定义:扩展的正则表达式

此限制的主要原因是允许将POSIX ERE转换为确定性有限自动机(DFA),实际上Unix中ERE的原始实现是作为DFA完成的.使用DFA可以保证实现的性能.与(无数个)反向引用进行模式匹配是一个NP难题,甚至可能是一个NP完全难题.如果对ERE提出反向引用,则永远无法在POSIX标准委员会中达成共识,因为这将迫使所有使用原始Unix实现的公司将其代码更改为非确定性实现,并放弃其性能保证,还有一些这些公司中有委员会成员.

The primary reason for this limitation is to allow POSIX EREs to be converted to a deterministic finite automata (DFA), and indeed the original implementation of EREs in Unix was done as a DFA. The use of a DFA allows guarantees to be made about the performance of the implementation. Pattern matching with (an unbounded number of) back references is an NP-hard problem, and perhaps even an NP-complete problem. Consensus in the POSIX standards committee could never have been reached if back-references were proposed for EREs because that would force all the companies using the original Unix implementation to change their code to a non-deterministic implementation and to drop their performance guarantees, and some of those companies had members on the committee.

也已经注意到,RE中的反向引用对于用户或实施者都不直观,实际上,它们比现在更频繁地引起极大的混乱.例如,请参见 RE解释中给出的示例:黑暗角落

It has also been noted that back-references in REs are not intuitive for either users or implementors, and indeed they've caused extreme confusion more often than now. See for example the examples given in RE-Interpretation: The Dark Corners

注意:RE中的反向引用与sed等工具中替换文本中对子模式的引用不同.

NOTE: back-references in REs are not the same as references to sub-patterns in substitution text in tools such as sed.