且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

半本地Levenshtein距离

更新时间:2023-02-10 16:32:36

计算滑动窗口的Levenshtein距离归结为计算非循环有向 planar 图中几对顶点之间的距离,看起来像这样(大写字母表示成对).

Computing the Levenshtein distances for a sliding window boils down to computing the distances between several pairs of vertices in an acyclic directed planar graph that looks like this one (capital letters denote the pairs).

   h a y s t a c k

n  A-B-C-D-E-F-*-*
   |\|\|\|\|\|\|\|
e  *-*-*-*-*-*-*-*
   |\|\|\|\|\|\|\|
e  *-*-A-B-C-D-E-F

水平和垂直弧的成本为1;如果对应的字母匹配,则对角弧的成本为0,否则为1.

The horizontal and vertical arcs have cost 1; the diagonal arcs have cost 0 if the corresponding letters match or 1 otherwise.

由于所有成对的顶点都位于无穷大的面上,因此Klein或 Cabello-Chambers的多源最短路径算法可用于计算所需的时间O(mn log(mn)).

Since all of the paired vertices lie on the infinite face, Klein's or Cabello-Chambers's multiple-source shortest paths algorithm can be used to compute the needed distances in time O(m n log (m n)).

要刮除最终的记录(实际上,比起Dijkstra的算法要差得多),您可以查看Alexander Tiskin的手稿半本地字符串比较:算法技术和应用程序,它可以解决与此问题相似的问题,即使不是这个问题本身. (也许这应该是我的主要答案,但我还没有读过,并且对多源最短路径技术的了解要好得多.)

To shave the final log (and practically speaking, it's much worse than for, e.g., Dijkstra's algorithm), you might look in Alexander Tiskin's manuscript Semi-local string comparison: Algorithmic techniques and applications, which treats problems similar to this one if not this one itself. (Probably that should be my primary answer, but I haven't read it and know the multiple-source shortest path techniques a lot better.)

也有可能,通过一些其他逻辑来处理单向边缘,我的 查看全文