如何在 Ruby 中进行模糊子字符串匹配?

更新时间：2022-01-14 00:11:42

你可以试试 amatch.它可以作为 ruby gem 使用，虽然我很长时间没有使用模糊逻辑，但它看起来有你需要的东西.amatch 的主页是:http://flori.github.com/amatch/.

You could try amatch. It's available as a ruby gem and, although I haven't worked with fuzzy logic for a long time, it looks to have what you need. The homepage for amatch is: http://flori.github.com/amatch/.

只是对这个想法感到无聊和混乱，一个完全未经优化和未经测试的解决方案黑客如下:

Just bored and messing around with the idea, a completely non-optimized and untested hack of a solution follows:

include 'amatch'

module FuzzyFinder
  def scanner( input )
    out = [] unless block_given?
    pos = 0
    input.scan(/(\w+)(\W*)/) do |word, white|
      startpos = pos
      pos = word.length + white.length
      if block_given?
        yield startpos, word
      else
        out << [startpos, word]
      end
    end
  end

  def find( text, doc )
    index = scanner(doc)
    sstr = text.gsub(/\W/,'')
    levenshtein = Amatch::Levensthtein.new(sstr)
    minlen = sstr.length
    maxndx = index.length
    possibles = []
    minscore = minlen*2
    index.each_with_index do |x, i|
      spos = x[0]
      str = x[1]
      si = i
      while (str.length < minlen)
        i += 1
        break unless i < maxndx
        str += index[i][1]
      end
      str = str.slice(0,minlen) if (str.length > minlen)
      score = levenshtein.search(str)
      if score < minscore
        possibles = [spos]
        minscore = score
      elsif score == minscore
        possibles << spos
      end
    end
    [minscore, possibles]
  end
end

显然有很多改进是可能的，而且可能是必要的！一些最重要的:

Obviously there are numerous improvements possible and probably necessary! A few off the top:

处理一次文档并存储结果，可能在数据库中.
确定字符串的可用长度对于初步检查，过程首先针对该初始子字符串在尝试匹配整个片段.
继上一个之后，预先计算的起始片段那个长度.

上一篇 : ：如何在Postgres中使用Blob数据类型下一篇 : 将参数传递给方法时，将标记查询增强为sqlInjection

如何在 Ruby 中进行模糊子字符串匹配?

相关阅读

技术问答最新文章