且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用动态编程了解正则表达式字符串匹配

更新时间:2022-11-15 07:42:56

使用DP解决这样的问题的直觉是找出答案对于以下问题

The intuition to solve problem like this using DP is to figure out answer to following questions


  1. 可以使用递归解决问题吗?

  2. 递归树中是否会重复出现较小的子问题?如果是,则可以通过以下方式存储较小问题的结果:只要遇到类似的子问题,就可以在O(1)中访问结果。通常将其称为记忆。

首先让我们了解问题的解决方案,当您以线性方式求解时会想出这一点。

Lets first understand the problem's solution which you would have figured out while solving in linear fashion.


  1. 同时将带有模式的文本与第一个字符匹配或不匹配。

  1. While matching the text with pattern either first character will match or it will not match.

情况1:第一个字符匹配或模式的第一个字符是'。

Case 1: First character matches or first character of pattern is '.'

情况1.1下一个字符是'*'

Case 1.1 Next character is '*'

案例1.2下一个字符不是'*'

Case 1.2 Next character is not '*'

案例2:第一个字符不匹配

Case 2: First character does not match

case 2.1下一个字符是'*'

case 2.1 Next character is '*'

case 2.2下一个字符不是'*'

case 2.2 Next character is not '*'

现在让我们找出前面针对上述问题讨论的两个步骤。

Lets now figure out two steps discussed earlier for above problem.

对于案例1.1,示例

isMatch( a, a * a) isMatch( aab, a * b)
等效于解决

isMatch("a", "a*a") and isMatch("aab", "a*b") which is equivalent to solving

isMatch( a, a )|| isMatch(, a * a)

isMatch( aab, b) || isMatch( ab, a * b) || code>条件的第一部分涉及模式中可选字符的情况,后跟'*',第二部分涉及重复匹配字符的情况。

isMatch("aab", "b") || isMatch("ab", "a*b") respectively. First part of || condition covers the scenario of optional character in pattern as it is followed by '*' and second part covers the scenario for repetition of matched character.

在所有情况下都可以形成类似的子问题。情况2.2
立即返回false。

Similarly sub-problems can be formed for all the cases. case 2.2 should straightaway return false.

下面是具有递归方法的Java代码

Below is the java code with recursive approach

public boolean isMatch(String text, String pattern) {
    dp = new Boolean[text.length()][pattern.length()];
    return isMatch(text, pattern, 0, 0);
}

private boolean isMatch(String text, String pattern, int ti, int pi) {

    if (pi == pattern.length() && ti < text.length()) return false;

    if (ti == text.length() && pi == pattern.length()) return true;

    if (ti == text.length()) {
        return isNextCharStar(pattern, pi) && isMatch(text, pattern, ti, pi + 2);
    }


    boolean result;
    final boolean hasFirstMatched = text.charAt(ti) == pattern.charAt(pi) || pattern.charAt(pi) == '.';

    if (isNextCharStar(pattern, pi)) {
        result = isMatch(text, pattern, ti, pi + 2);
        if (hasFirstMatched) {
            result = result || isMatch(text, pattern, ti + 1, pi);
        }
        return result;
    }

    return hasFirstMatched && isMatch(text, pattern, ti + 1, pi + 1);

}

private boolean isNextCharStar(String pattern, int pi) {
    return pi < pattern.length() - 1 && pattern.charAt(pi + 1) == '*';
}

现在让我们应用记忆步骤。如果在返回之前开始保存结果,则下次可以重用。我们应该如何保存它?
对于 ti pi 的所有可能组合,我们必须存储结果。其中 ti 是文本索引,而 pi 是样式索引。因此,尺寸为 text.length * pattern.length 的二维数组就足够了。以下是经过上述更改后的代码

Now lets apply the memoization step. If we start saving the result before returning we can reuse it next time. How and where should we save it? For all possible combinations of ti and pi we have to store the result. Where ti is text Index and pi is pattern index. So a 2d array of size text.length * pattern.length should be sufficient. Following is the code after above changes

Boolean [][] dp;

public boolean isMatch(String text, String pattern) {
    dp = new Boolean[text.length()][pattern.length()];
    return isMatch(text, pattern, 0, 0);
}

private boolean isMatch(String text, String pattern, int ti, int pi) {

    if (pi == pattern.length() ) return ti == text.length();

    if (ti == text.length()) {
        return isNextCharStar(pattern, pi) && isMatch(text, pattern, ti, pi + 2);
    }

    if(dp[ti][pi] != null) return dp[ti][pi];

    boolean result;
    final boolean hasFirstMatched = text.charAt(ti) == pattern.charAt(pi) || pattern.charAt(pi) == '.';

    if (isNextCharStar(pattern, pi)) {
        result = isMatch(text, pattern, ti, pi + 2);
        if (hasFirstMatched) {
            result = result || isMatch(text, pattern, ti + 1, pi);
        }
        dp[ti][pi] = result;
        return result;
    }

    dp[ti][pi] = hasFirstMatched && isMatch(text, pattern, ti + 1, pi + 1);
    return dp[ti][pi];

}

private boolean isNextCharStar(String pattern, int pi) {
    return pi < pattern.length() - 1 && pattern.charAt(pi + 1) == '*';
}

如果您仔细观察,仅更改了3行以使其成为DP递归解决方案。

If you will look closely only 3 lines have been changed to make it a DP solution from a recursive solution.