且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在log(n)时间中找到已排序数组中至少出现k次的元素

更新时间:2023-02-05 18:47:02

这里是 O(n / k log(k))解决方案:

i = 0
while i+k-1 < n: //don't get out of bounds
   if arr[i] == arr[i+k-1]:
       produce arr[i] as dupe
       i = min { j | arr[j] > arr[i] } //binary search
   else:
       c = min { j | arr[j] == arr[i+k-1] } //binary search
       i = c

这个想法是,如果索引 i + k-1 的元素与索引 i $的元素匹配,则检查该元素c $ c>-很好,是骗子。否则,您无需检查 i i + k-1 之间的所有元素。与 arr [i + k-1] 的值相同。

The idea is, you check the element at index i+k-1, if it matches the element at index i - good, it's a dupe. Otherwise, you don't need to check all the element between i to i+k-1, only the one with the same value as arr[i+k-1].

您确实需要回顾一下该元素的最早索引,但是可以保证超过 i + k 进行下一次迭代,使该算法的迭代总数为 O(n / k),每个算法取 O(logk)时间。

You do need to look back for for the earliest index of this element, but you are guaranteed to exceed the index i+k by next iteration, making the total number of iteration of this algorithm O(n/k), each takes O(logk) time.

这在渐近性上优于线性时间算法,尤其是对于 k 的大值>(对于 k O中的情况,算法衰减到 O(logn) (n),例如-查找至少以频率0.1重复的元素)

This is asymptotically better than linear time algorithm, especially for large values of k (where the algorithm decays to O(logn) for cases where k is in O(n), like for example - find element that repeats at least with frequency 0.1)