将NFA转换为正则表达式

更新时间：2023-02-17 22:39:39

答案是假设这些条件，因为可以修改任何NFA以满足这些要求.

The answer is assuming those conditions, because any NFA can be modified to fit those requirements.

对于任何一种NFA，您都可以将具有epsilon过渡的新初始状态q ₀添加到原始初始状态，还可以使用一个附加的转换符号∅(将其称为空集合符号(假定是与原始NFA中的任何符号都不匹配)到其他任何状态的符号，然后将此新状态用作新的初始状态.请注意，这不会更改原始NFA接受的语言.这会使您的NFA满足第一个条件.

For any kind of NFA, you can add a new initial state q₀ that has an epsilon-transition to the original initial state, and also using an additional transition symbol called ∅ (they call it empty set symbol, assumed to be a symbol which does not match any symbol from the original NFA) from it to any other states, then use this new state as the new initial state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the first condition.

对于任何一种NFA，您都可以添加一个新的接受状态q _a，该状态具有从原始NFA中的所有接受状态开始的ε跃迁.然后将其标记为唯一的接受状态.请注意，这不会更改原始NFA接受的语言.这会使您的NFA满足第二个条件.

For any kind of NFA, you can add a new acceptance state q_a that has an epsilon-transition from all acceptance state in the original NFA. Then mark this as the only acceptance state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the second condition.

通过上述构造，通过设置q ₀！= q _a，它满足第三条件.

By the above construction, by setting q₀ != q_a, it satisfies the third condition.

在您提供的链接中，第四个条件是通过使用一个特殊的转换符号called来解释的(空集符号)，原始NFA的实际字母都无法匹配.因此，您可以使用此新符号添加从每个状态到任何其他状态的过渡.请注意，这不会更改原始NFA接受的语言.

And in the link you provided, the fourth condition is explained by having a special transition symbol called ∅ (the empty set symbol) for which no actual alphabet from original NFA can match. So you can add transitions with this new symbol from every state to any other state. Note that this does not change the language accepted by the original NFA.

因此，现在已经对NFA进行了修改，使其可以满足这四个要求，您可以在此处应用该算法将NFA转换为正则表达式，该表达式将接受与原始NFA相同的语言.

So now the NFA has been modified to satisfies the four requirements, you can apply the algorithm there to convert the NFA into Regular Expression, which would accept the same language as the original NFA.

编辑以回答其他问题:

要在评论中回答您的问题，请考虑具有两种状态的NFA，即q _A和q _B. q _A是初始状态，也是唯一的接受状态.我们从q _A过渡到其自身，符号为0,1.我们还从q _A过渡到带符号1的q _B.最后，我们从q _B过渡到q _{A 带有符号0的sub>.}

To answer your question in the comment, consider the NFA with two states, q_A and q_B. q_A is the initial state as well as the only acceptance state. We have a transition from q_A to itself with symbol 0,1. We also have transition from q_A to q_B with symbol 1. Lastly we have transition from q_B to q_A with symbol 0.

可视化:


 0,1    
  |  1
->q_A----->q_B
  ^       |
  |-------|
     0

步骤2.当我们对NFA进行标准化时，只需放置一个指向q _A的新初始化状态(q _init)，然后放置一个新的接受状态(q _acc)来自q _A.

Step 2. When we normalize the NFA, just put the new init state (q_init) that points to q_A, and put a new acceptance state (q_acc) from q_A.

第3步.我们要删除q _A.因此，q _A是算法中的q _rip(第3页).现在我们需要考虑进入q _A的每个状态以及离开q _A的每个状态.在这种情况下，有两个指向q _A的状态，分别是q _init和q _B. q _A指向两个状态，分别是q _B和q _acc.通过算法，我们将过渡q _in->> q _rip-> q _out替换为过渡q _in-> q _out，其转换符号为R _dir + R _in(R _rip) * R _out，其中:

Step 3. We want to remove q_A. So q_A is the q_rip in the algorithm (in page 3). Now we need to consider every states that enters q_A and every states that exits from q_A. In this case, there are two states pointing to q_A, that are q_init and q_B. There are two states that are pointed to by q_A, that are q_B and q_acc. By the algorithm, we replace the transitions q_in->q_rip->q_out with a transition q_in->q_out, having the transition symbol R_dir+R_in(R_rip)*R_out, where:

R _dir是从q _in到q _out
R _in是从q _in到q _rip
R _rip是q _rip
R _out是从q _rip到q _out

R_dir is the original transition from q_in to q_out
R_in is the original transition from q_in to q_rip
R_rip is the original loop at q_rip
R_out is the original transition from q_rip to q_out

因此，在这种情况下，我们用q _{init替换过渡q _init->> q _A-> q _B带有过渡符号(0 + 1)* 1的}-> q _B.继续此过程，我们将总共创建4个新过渡:

So in this case we replace the transition q_init->q_A->q_B with q_init->q_B with transition symbol (0+1)*1. Continuing this process, we will create in total 4 new transitions:

q _init-> q _B:(0 + 1)* 1
q _init-> q _acc:(0 + 1)*
q _B-> q _B:0(0 + 1)* 1
q _B-> q _acc:0(0 + 1)*

q_init->q_B: (0+1)*1
q_init->q_acc: (0+1)*
q_B->q_B: 0(0+1)*1
q_B->q_acc: 0(0+1)*

然后我们可以删除q _A.

Then we can remove q_A.

第4步.我们要删除q _B.同样，我们确定q _in和q _out.这里只有一个状态到达q _B，即q _init，只有一个状态偏离q _B，即q _acc.因此，我们有:

Step 4. We want to remove q_B. Again, we identify the q_in and q_out. There is only one state coming to q_B here, which is q_init, and there is only one state departing from q_B, which is q_acc. So we have:

R _dir =(0 + 1)*
R _in =(0 + 1)* 1
R _rip = 0(0 + 1)* 1
R _out = 0(0 + 1)*

R_dir = (0+1)*
R_in = (0+1)*1
R_rip = 0(0+1)*1
R_out = 0(0+1)*

因此，新的过渡q _init-> q _acc将是:

So the new transition q_init->q_acc will be:

R _dir + R _in(R _rip)* R _out

R_dir+R_in(R_rip)*R_out

(0 + 1)* +(0 + 1)* 1(0(0 + 1)* 1)* 0(0 + 1)*

(0+1)* + (0+1)*1 (0(0+1)*1)* 0(0+1)*

我们可以删除q _B.

第5步.由于原始NFA中的每个状态都已删除，因此我们完成了.因此，最终的正则表达式如上所示.

Step 5. Since every state in the original NFA has been removed, we are done. So the final regex is shown above.

请注意，最终的正则表达式可能不是***的(并且在大多数情况下不是***的)，这是算法所期望的.通常，为NFA(甚至DFA)找到最短的正则表达式是非常困难的(尽管在此示例中，很容易看到第一个组件已经覆盖了所有可能的字符串)

Note that the final regex might not be optimal (and in most cases it won't be optimal), this is expected from the algorithm. Finding the shortest regex for an NFA (or even DFA) in general is very difficult (although for this example it's easy to see that the first component already covers all possible strings)

出于完整性考虑，接受相同语言的最短正则表达式为:

For completeness, the shortest regex accepting the same language will be:

(0 + 1)*

(0+1)*

上一篇 : ：如何在正则表达式中写可选单词?下一篇 : PCRE 正则表达式非连续重复

将NFA转换为正则表达式

相关阅读

技术问答最新文章