更新时间:2022-11-14 10:28:04
使用适当的算法算法而不是蛮力方法会更好。***描述了最常见的子字符串问题的两种常见解决方案:后缀树和动态编程。
You would be better off with a proper algorithm for the task rather than a brute-force approach. Wikipedia describes two common solutions to the longest common substring problem: suffix-tree and dynamic-programming.
动态编程解决方案需要O( nm )时间和O( nm )空间。对于最长的公共子字符串,这几乎是对Wikipedia伪代码的直接Java翻译:
The dynamic programming solution takes O(n m) time and O(n m) space. This is pretty much a straightforward Java translation of the Wikipedia pseudocode for the longest common substring:
public static Set<String> longestCommonSubstrings(String s, String t) {
int[][] table = new int[s.length()][t.length()];
int longest = 0;
Set<String> result = new HashSet<>();
for (int i = 0; i < s.length(); i++) {
for (int j = 0; j < t.length(); j++) {
if (s.charAt(i) != t.charAt(j)) {
continue;
}
table[i][j] = (i == 0 || j == 0) ? 1
: 1 + table[i - 1][j - 1];
if (table[i][j] > longest) {
longest = table[i][j];
result.clear();
}
if (table[i][j] == longest) {
result.add(s.substring(i - longest + 1, i + 1));
}
}
}
return result;
}
现在,您需要所有常见的子串,而不仅仅是最长的子串。您可以增强此算法以包含更短的结果。让我们检查表格中的示例输入 eatsleepnightxyz
和 eatsleepabcxyz
:
Now, you want all of the common substrings, not just the longest. You can enhance this algorithm to include shorter results. Let's examine the table for the example inputs eatsleepnightxyz
and eatsleepabcxyz
:
e a t s l e e p a b c x y z
e 1 0 0 0 0 1 1 0 0 0 0 0 0 0
a 0 2 0 0 0 0 0 0 1 0 0 0 0 0
t 0 0 3 0 0 0 0 0 0 0 0 0 0 0
s 0 0 0 4 0 0 0 0 0 0 0 0 0 0
l 0 0 0 0 5 0 0 0 0 0 0 0 0 0
e 1 0 0 0 0 6 1 0 0 0 0 0 0 0
e 1 0 0 0 0 1 7 0 0 0 0 0 0 0
p 0 0 0 0 0 0 0 8 0 0 0 0 0 0
n 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t 0 0 1 0 0 0 0 0 0 0 0 0 0 0
x 0 0 0 0 0 0 0 0 0 0 0 1 0 0
y 0 0 0 0 0 0 0 0 0 0 0 0 2 0
z 0 0 0 0 0 0 0 0 0 0 0 0 0 3
eatsleep
结果很明显:这是左上角的 12345678
对角条纹。 xyz
结果是右下角的 123
对角线。 a
结果由顶部附近的 1
表示(第二行,第九列)。 t
结果由左下角附近的 1
表示。
eatsleep
result is obvious: that's the 12345678
diagonal streak at the top-left.xyz
result is the 123
diagonal at the bottom-right.a
result is indicated by the 1
near the top (second row, ninth column).t
result is indicated by the 1
near the bottom left.左边,顶部和 1 怎么样? $ c> 6 和 7
?那些不计算,因为它们出现在由 12345678
对角线形成的矩形内 - 换句话说,它们已被 eatsleep $ c覆盖$ c>。
What about the other 1
s at the left, the top, and next to the 6
and 7
? Those don't count because they appear within the rectangle formed by the 12345678
diagonal — in other words, they are already covered by eatsleep
.
我建议除了构建表之外什么都不做。然后,进行第二次传递,从右下角向后迭代,以收集结果集。
I recommend doing one pass doing nothing but building the table. Then, make a second pass, iterating backwards from the bottom-right, to gather the result set.