更新时间:2023-02-10 16:32:36
计算滑动窗口的Levenshtein距离归结为计算非循环有向 planar 图中几对顶点之间的距离,看起来像这样(大写字母表示成对).
Computing the Levenshtein distances for a sliding window boils down to computing the distances between several pairs of vertices in an acyclic directed planar graph that looks like this one (capital letters denote the pairs).
h a y s t a c k
n A-B-C-D-E-F-*-*
|\|\|\|\|\|\|\|
e *-*-*-*-*-*-*-*
|\|\|\|\|\|\|\|
e *-*-A-B-C-D-E-F
水平和垂直弧的成本为1;如果对应的字母匹配,则对角弧的成本为0,否则为1.
The horizontal and vertical arcs have cost 1; the diagonal arcs have cost 0 if the corresponding letters match or 1 otherwise.
由于所有成对的顶点都位于无穷大的面上,因此Klein或 Cabello-Chambers的多源最短路径算法可用于计算所需的时间O(mn log(mn)).
Since all of the paired vertices lie on the infinite face, Klein's or Cabello-Chambers's multiple-source shortest paths algorithm can be used to compute the needed distances in time O(m n log (m n)).
要刮除最终的记录(实际上,比起Dijkstra的算法要差得多),您可以查看Alexander Tiskin的手稿半本地字符串比较:算法技术和应用程序,它可以解决与此问题相似的问题,即使不是这个问题本身. (也许这应该是我的主要答案,但我还没有读过,并且对多源最短路径技术的了解要好得多.)
To shave the final log (and practically speaking, it's much worse than for, e.g., Dijkstra's algorithm), you might look in Alexander Tiskin's manuscript Semi-local string comparison: Algorithmic techniques and applications, which treats problems similar to this one if not this one itself. (Probably that should be my primary answer, but I haven't read it and know the multiple-source shortest path techniques a lot better.)
也有可能,通过一些其他逻辑来处理单向边缘,我的 查看全文