且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

无限生成汉明序列的最新技术

更新时间:2023-02-27 11:09:45

如果一个常数因子(1) 加速很重要,那么我可以提供一个效率更高的版本:

If a constant factor(1) speedup counts as significant, then I can offer a significantly more efficient version:

hamm :: [Integer]
hamm = mrg1 hamm3 (map (2*) hamm)
  where
    hamm5 = iterate (5*) 1
    hamm3 = mrg1 hamm5 (map (3*) hamm3)
    merge a@(x:xs) b@(y:ys)
        | x < y     = x : merge xs b
        | otherwise = y : merge a ys
    mrg1 (x:xs) ys = x : merge xs ys

您可以轻松地将其推广到给定一组素数的平滑数字:

You can easily generalise it to smooth numbers for a given set of primes:

hamm :: [Integer] -> [Integer]
hamm [] = [1]
hamm [p] = iterate (p*) 1
hamm ps = foldl' next (iterate (q*) 1) qs
  where
    (q:qs) = sortBy (flip compare) ps
    next prev m = let res = mrg1 prev (map (m*) res) in res
    merge a@(x:xs) b@(y:ys)
        | x < y     = x : merge xs b
        | otherwise = y : merge a ys
    mrg1 (x:xs) ys = x : merge xs ys

它更高效,因为该算法不会产生任何重复项并且使用更少的内存.在您的版本中,当产生 h 附近的汉明数时,h/5h 之间的列表部分必须在内存中.在我的版本中,只有完整列表的 h/2h 之间的部分,以及 h/3 之间的部分3-5-list 的 h 需要在内存中.由于 3-5-list 更稀疏,并且 k-smooth 数的密度降低,这两个列表部分需要的内存比完整列表的较大部分少得多.

It's more efficient because that algorithm doesn't produce any duplicates and it uses less memory. In your version, when a Hamming number near h is produced, the part of the list between h/5 and h has to be in memory. In my version, only the part between h/2 and h of the full list, and the part between h/3 and h of the 3-5-list needs to be in memory. Since the 3-5-list is much sparser, and the density of k-smooth numbers decreases, those two list parts need much less memory that the larger part of the full list.

两种算法产生kth汉明数的一些时间,每个目标相对于前一个目标的经验复杂度,不包括和包括GC时间:>

Some timings for the two algorithms to produce the kth Hamming number, with empirical complexity of each target relative to the previous, excluding and including GC time:

  k            Yours (MUT/GC)               Mine (MUT/GC)
 10^5           0.03/0.01                    0.01/0.01      -- too short to say much, really
2*10^5          0.07/0.02                    0.02/0.01
5*10^5          0.17/0.06  0.968  1.024      0.06/0.04      1.199    1.314
 10^6           0.36/0.13  1.082  1.091      0.11/0.10      0.874    1.070
2*10^6          0.77/0.27  1.097  1.086      0.21/0.21      0.933    1.000
5*10^6          1.96/0.71  1.020  1.029      0.55/0.59      1.051    1.090
 10^7           4.05/1.45  1.047  1.043      1.14/1.25      1.052    1.068
2*10^7          8.73/2.99  1.108  1.091      2.31/2.65      1.019    1.053
5*10^7         21.53/7.83  0.985  1.002      6.01/7.05      1.044    1.057
 10^8          45.83/16.79 1.090  1.093     12.42/15.26     1.047    1.084

可以看到,MUT 次数之间的因数约为 3.5,但 GC 时间相差不大.

As you can see, the factor between the MUT times is about 3.5, but the GC time is not much different.

(1) 好吧,它看起来是恒定的,而且我认为这两种变体具有相同的计算复杂度,但是我没有拿出铅笔和纸来证明它,也不打算这样做.

(1) Well, it looks constant, and I think both variants have the same computational complexity, but I haven't pulled out pencil and paper to prove it, nor do I intend to.