且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

查找最长子字符串且无重复-帮助优化代码[Ruby]

更新时间:2023-11-02 22:30:28

方法

这是通过将字符映射到索引的哈希完成的一种方法.对于字符串s,假设子字符串s[j..j+n-1]中的字符是唯一的,因此该子字符串是最长唯一子字符串的候选者.因此,下一个元素是e = s[j+n].我们希望确定s[j..j+n-1]是否包含e.如果没有,我们可以将e附加到子字符串,使其保持唯一.

Here is a way of doing that with a hash that maps characters to indices. For a string s, suppose the characters in the substring s[j..j+n-1] are unique, and therefore the substring is a candidate for the longest unique substring. The next element is therefore e = s[j+n] We wish to determine if s[j..j+n-1] includes e. If it does not we can append e to the substring, keeping it unique.

如果s[j..j+n-1]包括e,我们将确定n(子字符串的大小)是否大于先前已知的子字符串的长度,如果是,则更新我们的记录.要确定s[j..j+n-1]是否包括e,我们可以对子字符串执行线性搜索,但是维护键值对为s[i]=>ii = j..j_n-1的哈希c_to_i更快.也就是说,c_to_i将子字符串中的字符映射到其完整字符串s中的索引.这样,我们只能评估c_to_i.key?(e)来查看子字符串是否包含e.如果子字符串包含e,则使用c_to_i确定其在s中的索引并添加一个:j = c_to_i[e] + 1.因此,新的子字符串为s[j..j+n-1],其新值为j.请注意,在此步骤中可能会跳过s的几个字符.

If s[j..j+n-1] includes e, we determine if n (the size of the substring) is greater than the length of the previously-known substring, and update our records if it is. To determine if s[j..j+n-1] includes e, we could perform a linear search of the substring, but it is faster to maintain a hash c_to_i whose key-value pairs are s[i]=>i, i = j..j_n-1. That is, c_to_i maps the characters in the substring to their indices in full string s. That way we can merely evaluate c_to_i.key?(e) to see if the substring contains e. If the substring includes e, we use c_to_i to determine its index in s and add one: j = c_to_i[e] + 1. The new substring is therefore s[j..j+n-1] with the new value of j. Note that several characters of s may be skipped in this step.

无论子字符串是否包含e,现在我们都必须将e附加到(可能已更新的)子字符串上,以使其变为s[j..j+n].

Regardless of whether the substring contained e, we must now append e to the (possibly-updated) substring, so that it becomes s[j..j+n].

代码

def longest_no_repeats(str)
  c_to_i = {}
  longest = { length: 0, end: nil }
  str.each_char.with_index do |c,i|
    j = c_to_i[c]
    if j
      longest = { length: c_to_i.size, end: i-1 } if
        c_to_i.size > longest[:length]
      c_to_i.reject! { |_,k| k <= j }
    end
    c_to_i[c] = i
  end
  c_to_i.size > longest[:length] ? { length: c_to_i.size, end: str.size-1 } :
    longest
end

示例

a = ('a'..'z').to_a
  #=> ["a", "b",..., "z"]

str = 60.times.map { a.sample }.join
  #=> "ekgdaxxzlwbxixhlfbpziswcoelplhobivoygmupdaexssbuuawxmhprkfms"

longest = longest_no_repeats(str)
  #=> {:length=>14, :end=>44} 
str[0..longest[:end]]
  #=> "ekgdaxxzlwbxixhlfbpziswcoelplhobivoygmupdaexs" 
str[longest[:end]-longest[:length]+1,longest[:length]]
  #=>                                "bivoygmupdaexs" 

效率

以下是@mechnicov代码的基准比较:

Here is a benchmark comparison to @mechnicov's code:

require 'benchmark/ips'

a = ('a'..'z').to_a
arr = 50.times.map { 1000.times.map { a.sample }.join }

Benchmark.ips do |x|
  x.report("mechnicov") { arr.sum { |s| max_non_repeated(s)[:length]   } }
  x.report("cary")      { arr.sum { |s| longest_no_repeats(s)[:length] } }
  x.compare!
end

显示:

Comparison:
            cary:       35.8 i/s
       mechnicov:        0.0 i/s - 1198.21x  slower