且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将NFA转换为正则表达式

更新时间:2023-02-17 22:39:39

答案是假设这些条件,因为可以修改任何NFA以满足这些要求.

The answer is assuming those conditions, because any NFA can be modified to fit those requirements.

对于任何一种NFA,您都可以将具有epsilon过渡的新初始状态q 0 添加到原始初始状态,还可以使用一个附加的转换符号∅(将其称为空集合符号(假定是与原始NFA中的任何符号都不匹配)到其他任何状态的符号,然后将此新状态用作新的初始状态.请注意,这不会更改原始NFA接受的语言.这会使您的NFA满足第一个条件.

For any kind of NFA, you can add a new initial state q0 that has an epsilon-transition to the original initial state, and also using an additional transition symbol called ∅ (they call it empty set symbol, assumed to be a symbol which does not match any symbol from the original NFA) from it to any other states, then use this new state as the new initial state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the first condition.

对于任何一种NFA,您都可以添加一个新的接受状态q a ,该状态具有从原始NFA中的所有接受状态开始的ε跃迁.然后将其标记为唯一的接受状态.请注意,这不会更改原始NFA接受的语言.这会使您的NFA满足第二个条件.

For any kind of NFA, you can add a new acceptance state qa that has an epsilon-transition from all acceptance state in the original NFA. Then mark this as the only acceptance state. Note that this does not change the language accepted by the original NFA. This would make your NFA satisfies the second condition.

通过上述构造,通过设置q 0 != q a ,它满足第三条件.

By the above construction, by setting q0 != qa, it satisfies the third condition.

在您提供的链接中,第四个条件是通过使用一个特殊的转换符号called来解释的(空集符号),原始NFA的实际字母都无法匹配.因此,您可以使用此新符号添加从每个状态到任何其他状态的过渡.请注意,这不会更改原始NFA接受的语言.

And in the link you provided, the fourth condition is explained by having a special transition symbol called ∅ (the empty set symbol) for which no actual alphabet from original NFA can match. So you can add transitions with this new symbol from every state to any other state. Note that this does not change the language accepted by the original NFA.

因此,现在已经对NFA进行了修改,使其可以满足这四个要求,您可以在此处应用该算法将NFA转换为正则表达式,该表达式将接受与原始NFA相同的语言.

So now the NFA has been modified to satisfies the four requirements, you can apply the algorithm there to convert the NFA into Regular Expression, which would accept the same language as the original NFA.

编辑以回答其他问题:

要在评论中回答您的问题,请考虑具有两种状态的NFA,即q A 和q B . q A 是初始状态,也是唯一的接受状态.我们从q A 过渡到其自身,符号为0,1.我们还从q A 过渡到带符号1的q B .最后,我们从q B 过渡到q A 带有符号0的sub>.

To answer your question in the comment, consider the NFA with two states, qA and qB. qA is the initial state as well as the only acceptance state. We have a transition from qA to itself with symbol 0,1. We also have transition from qA to qB with symbol 1. Lastly we have transition from qB to qA with symbol 0.

可视化:


 0,1    
  |  1
->qA----->qB
  ^       |
  |-------|
     0

步骤2.当我们对NFA进行标准化时,只需放置一个指向q A 的新初始化状态(q init ),然后放置一个新的接受状态(q acc )来自q A .

Step 2. When we normalize the NFA, just put the new init state (qinit) that points to qA, and put a new acceptance state (qacc) from qA.

第3步.我们要删除q A .因此,q A 是算法中的q rip (第3页).现在我们需要考虑进入q A 的每个状态以及离开q A 的每个状态.在这种情况下,有两个指向q A 的状态,分别是q init 和q B . q A 指向两个状态,分别是q B 和q acc .通过算法,我们将过渡q in ->> q rip -> q out 替换为过渡q in -> q out ,其转换符号为R dir + R in (R rip ) * R out ,其中:

Step 3. We want to remove qA. So qA is the qrip in the algorithm (in page 3). Now we need to consider every states that enters qA and every states that exits from qA. In this case, there are two states pointing to qA, that are qinit and qB. There are two states that are pointed to by qA, that are qB and qacc. By the algorithm, we replace the transitions qin->qrip->qout with a transition qin->qout, having the transition symbol Rdir+Rin(Rrip)*Rout, where:

  1. R dir 是从q in 到q out
  2. 的原始过渡
  3. R in 是从q in 到q rip
  4. 的原始过渡
  5. R rip 是q rip
  6. 处的原始循环
  7. R out 是从q rip 到q out
  8. 的原始过渡
  1. Rdir is the original transition from qin to qout
  2. Rin is the original transition from qin to qrip
  3. Rrip is the original loop at qrip
  4. Rout is the original transition from qrip to qout

因此,在这种情况下,我们用q init替换过渡q init ->> q A -> q B 带有过渡符号(0 + 1)* 1的-> q B .继续此过程,我们将总共创建4个新过渡:

So in this case we replace the transition qinit->qA->qB with qinit->qB with transition symbol (0+1)*1. Continuing this process, we will create in total 4 new transitions:

  1. q init -> q B :(0 + 1)* 1
  2. q init -> q acc :(0 + 1)*
  3. q B -> q B :0(0 + 1)* 1
  4. q B -> q acc :0(0 + 1)*
  1. qinit->qB: (0+1)*1
  2. qinit->qacc: (0+1)*
  3. qB->qB: 0(0+1)*1
  4. qB->qacc: 0(0+1)*

然后我们可以删除q A .

Then we can remove qA.

第4步.我们要删除q B .同样,我们确定q in 和q out .这里只有一个状态到达q B ,即q init ,只有一个状态偏离q B ,即q acc .因此,我们有:

Step 4. We want to remove qB. Again, we identify the qin and qout. There is only one state coming to qB here, which is qinit, and there is only one state departing from qB, which is qacc. So we have:

  1. R dir =(0 + 1)*
  2. R in =(0 + 1)* 1
  3. R rip = 0(0 + 1)* 1
  4. R out = 0(0 + 1)*
  1. Rdir = (0+1)*
  2. Rin = (0+1)*1
  3. Rrip = 0(0+1)*1
  4. Rout = 0(0+1)*

因此,新的过渡q init -> q acc 将是:

So the new transition qinit->qacc will be:

R dir + R in (R rip )* R out

Rdir+Rin(Rrip)*Rout

(0 + 1)* +(0 + 1)* 1(0(0 + 1)* 1)* 0(0 + 1)*

(0+1)* + (0+1)*1 (0(0+1)*1)* 0(0+1)*

我们可以删除q B .

第5步.由于原始NFA中的每个状态都已删除,因此我们完成了.因此,最终的正则表达式如上所示.

Step 5. Since every state in the original NFA has been removed, we are done. So the final regex is shown above.

请注意,最终的正则表达式可能不是***的(并且在大多数情况下不是***的),这是算法所期望的.通常,为NFA(甚至DFA)找到最短的正则表达式是非常困难的(尽管在此示例中,很容易看到第一个组件已经覆盖了所有可能的字符串)

Note that the final regex might not be optimal (and in most cases it won't be optimal), this is expected from the algorithm. Finding the shortest regex for an NFA (or even DFA) in general is very difficult (although for this example it's easy to see that the first component already covers all possible strings)

出于完整性考虑,接受相同语言的最短正则表达式为:

For completeness, the shortest regex accepting the same language will be:

(0 + 1)*

(0+1)*