且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

算法创建的正则表达式嵌套模式的第n个级别

更新时间:2022-02-13 22:09:47

您可以考虑一下更理论上:一个匹配括号嵌套 N 深的就是圆括号左右比赛为 N-1 或更少深(至少有一个完全 N-1 深)。

You can think about it more theoretically: a match for parenthesis nested n deep is just parenthesis around matches for n-1 or less deep (with at least one exactly n-1 deep).

我们可以给正则表达式的递归定义。让 X [N] 是正则表达式嵌套完全 N 的水平,和 Y [N ] 是正则表达式包含括号嵌套最多 N 级别的任何级别的字符串,因此:

We can give a recursive definition of the regexes. Let X[n] be the regex for nesting exactly n levels, and Y[n] be the regex for a string containing brackets with any level of nesting up to n levels, so:

X[n] = \( (Y[n-2] X[n-1])+ Y[n-2] \)

Y[n] = [^()]* ( \( Y[n-1] \) [^()]* )*

Y [0] = X [0] = [^()] * (无嵌套)和 X [1] = \ ([^()] * \)。 (我不是非捕获组等细节困扰着呢,空间只是为了便于阅读。)

with Y[0] = X[0] = [^()]* (no nesting) and X[1] = \([^()]*\). (I'm not bothering with the details of non-capturing groups etc yet, and the spaces are just for readability.)

写在此基础上的算法应该是很容易的。

Writing an algorithm based on this should be quite easy.

从这些新的(减少相互递归)定义的正则表达式变长了很多的要慢得多(他们是多项式,而不是指数)。

The regexes from these new (less mutually recursive) definitions get longer much much more slowly (they are polynomial rather than exponential).

L [N] 是长度 X [N] L [N] 是长度 Y [N] ,然后(常数项只是硬codeD中的每一个字符):

Let l[n] be the length of X[n], and L[n] be the length of Y[n], then (the constant terms are just the hardcoded characters in each one):

L[n] = 19 + L[n-1] = 19*n + L[0] = 19*n + 6

l[n] = 3 + L[n-2] + l[n-1] + 2 + L[n-2] + 2
     = 7 + 2 * L[n-2] + l[n-1]
     = -57 + 38 * n + l[n-1]

相应的初始条件L [0] L [1] 。这种形式的递推关系有二次的解决方案,因此,这是唯一的为O(n ^ 2)。好多了。

with the appropriate initial conditions for l[0] and l[1]. Recurrence relations of this form have quadratic solutions, so this is only O(n^2). Much better.

(对于其他人,我有 Y [N] Y [N] = Y [N-1 previous定义] | X [N] ;这额外的递归意味着的 X 的长度正则表达式是 O(2.41 ^ N ),它吸收了不少。)

(For others, I had a previous definition of Y[n] was Y[n] = Y[n-1] | X[n]; this extra recursion meant that the length of the X regex was O(2.41^n), which sucks a lot.)

的新定义是是一个诱人的暗示,甚至有可能会写的一种方式 X 的是线性的 N 。我不知道,虽然,我有一种感觉上的 X 的确切额外的限制长度是指它是不可能的。)

(The new definition of Y is a tantalising hint that there might even be a way of writing X that is linear in n. I don't know though, and I have a feeling the extra restriction on X of exact length means it is impossible.)

下面是一些Python code,计算上述正则表达式,你也许可以将其翻译成JavaScript没有太多的麻烦。

The following is some Python code that computes the regexes above, you can probably translate it to javascript without too much trouble.

# abbreviation for the No Parenthesis regex
np = "[^()]*"

# compute Y[n] from Y[n-1]
def next_y(y_n1):
    return np + "(?:\(" + y_n1 + "\)" + np + ")*"

# compute X[n] from X[n-1] and Y[n-2]
def next_x(x_n1, y_n2):
    return "\((?:" + y_n2 + x_n1 + ")+" + y_n2 + "\)"

# compute [X[n], Y[n], Y[n-1]]
# (to allow us to make just one recursive call at each step)
def XY(n):
    if n == 0:
        return [np, # X[0]
                np, # Y[0]
                ""] # unused
    elif n == 1:
        return ["\([^()]*\)", # X[1]
                next_y(np),   # Y[1]
                np]           # Y[0]

    x_n1, y_n1, y_n2 = XY(n-1) # X[n-1], Y[n-1], Y[n-2]

    return [next_x(x_n1, y_n2), # X[n]
            next_y(y_n1),       # Y[n]
            y_n1]               # Y[n-1]

# wrapper around XY to compute just X[n]
def X(n):
    return XY(n)[0]

# wrapper around XY to compute just Y[n]
def Y(n):
    return XY(n)[1]