且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Ruby:如何计算一个字符串出现在另一个字符串中的次数?

更新时间:2021-11-21 22:00:53

有两种方法可以计算给定子字符串出现在字符串中的次数(第一种是我的偏好)。注意(由OP确认),子字符串'aa'在字符串'aaa'中出现两次,因此五次:

Here are a couple of ways to count the numbers of times a given substring appears in a string (the first being my preference). Note (as confirmed by the OP) the substring 'aa' appears twice in the string 'aaa', and therefore five times in:

string="aaabbccaaaaddbb"

#1

使用 String#scan 带有正则表达式的正则表达式,该正则表达式查找子字符串:

Use String#scan with a regex that contains a positive lookahead that looks for the substring:

def count_em(string, substring)
  string.scan(/(?=#{substring})/).count
end

count_em(string,"aa")
 #=> 5

注意:

"aaabbccaaaaddbb".scan(/(?=aa)/)
  #=> ["", "", "", "", ""]

正向后看会产生相同的结果:

A positive lookbehind produces the same result:

"aaabbccaaaaddbb".scan(/(?<=aa)/)
  #=> ["", "", "", "", ""]

c $ c> String#scan 可以替换为字符串#gsub

As well, String#scan can be replaced with String#gsub.

#2

转换为数组,应用 Enumerable#each_cons ,然后加入并计数:

Convert to an array, apply Enumerable#each_cons, then join and count:

def count_em(string, substring)
  string.each_char.each_cons(substring.size).map(&:join).count(substring)
end

count_em(string,"aa")
  #=> 5

我们有:

enum0 = "aaabbccaaaaddbb".each_char
  #=> #<Enumerator: "aaabbccaaaaddbb":each_char>

我们可以通过将此枚举器转换为数组来查看其生成的元素:

We can see the elements that will generated by this enumerator by converting it to an array:

enum0.to_a
  #=> ["a", "a", "a", "b", "b", "c", "c", "a", "a", "a",
  #    "a", "d", "d", "b", "b"]

enum1 = enum0.each_cons("aa".size)
  #=> #<Enumerator: #<Enumerator: "aaabbccaaaaddbb":each_char>:each_cons(2)> 

enum1 转换为数组以查看结果枚举器将传递给 map 的值:

Convert enum1 to an array to see what values the enumerator will pass on to map:

enum1.to_a
  #=> [["a", "a"], ["a", "a"], ["a", "b"], ["b", "b"], ["b", "c"],
  #    ["c", "c"], ["c", "a"], ["a", "a"], ["a", "a"], ["a", "a"], 
  #    ["a", "d"], ["d", "d"], ["d", "b"], ["b", "b"]]

c = enum1.map(&:join)
  #=> ["aa", "aa", "ab", "bb", "bc", "cc", "ca",
  #    "aa", "aa", "aa", "ad", "dd", "db", "bb"]
c.count("aa")
  #=> 5