且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用正则表达式检查字符串是否为回文?

更新时间:2023-02-21 10:46:11

这个问题的答案是不可能".更具体地说,面试官想知道你是否在计算理论课上集中注意力.

The answer to this question is that "it is impossible". More specifically, the interviewer is wondering if you paid attention in your computational theory class.

在计算理论课上,您了解了有限状态机.有限状态机由节点和边组成.每条边都用有限字母表中的一个字母进行注释.一个或多个节点是特殊的接受"节点,一个节点是开始"节点.当从给定的单词中读取每个字母时,我们遍历机器中的给定边.如果我们最终处于接受状态,那么我们说机器接受"了这个词.

In your computational theory class you learned about finite state machines. A finite state machine is composed of nodes and edges. Each edge is annotated with a letter from a finite alphabet. One or more nodes are special "accepting" nodes and one node is the "start" node. As each letter is read from a given word we traverse the given edge in the machine. If we end up in an accepting state then we say that the machine "accepts" that word.

正则表达式总是可以被翻译成等效的有限状态机.也就是说,接受和拒绝与正则表达式相同的词的一个(在现实世界中,一些正则表达式语言允许任意函数,这些不算).

A regular expression can always be translated into an equivalent finite state machine. That is, one that accepts and rejects the same words as the regular expression (in the real world, some regexp languages allow for arbitrary functions, these don't count).

建立一个接受所有回文的有限状态机是不可能的.证明依赖于这样一个事实,即我们可以轻松构建一个需要任意大量节点的字符串,即字符串

It is impossible to build a finite state machine that accepts all palindromes. The proof relies on the facts that we can easily build a string that requires an arbitrarily large number of nodes, namely the string

a^x b a^x(例如,aba, aabaa, aaabaaa, aaaabaaaa, ....)

a^x b a^x (eg., aba, aabaa, aaabaaa, aaaabaaaa, ....)

其中 a^x 是重复的 x 次.这至少需要 x 个节点,因为在看到 'b' 之后,我们必须倒数 x 次以确保它是一个回文.

where a^x is a repeated x times. This requires at least x nodes because, after seeing the 'b' we have to count back x times to make sure it is a palindrome.

最后,回到最初的问题,你可以告诉面试官你可以写一个正则表达式,接受所有小于某个有限固定长度的回文.如果现实世界的应用程序需要识别回文,那么它几乎肯定不会包括任意长的回文,因此这个答案将表明您可以区分理论上的不可能与现实世界的应用程序.尽管如此,实际的正则表达式会很长,比等效的 4 行程序要长得多(读者的简单练习:编写一个识别回文的程序).

Finally, getting back to the original question, you could tell the interviewer that you can write a regular expression that accepts all palindromes that are smaller than some finite fixed length. If there is ever a real-world application that requires identifying palindromes then it will almost certainly not include arbitrarily long ones, thus this answer would show that you can differentiate theoretical impossibilities from real-world applications. Still, the actual regexp would be quite long, much longer than equivalent 4-line program (easy exercise for the reader: write a program that identifies palindromes).