且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

随机数生成器(RNG/PRNG)返回种子的更新值

更新时间:2022-06-10 08:33:09

您提出的解决方案在我看来应该可以正常工作-您正在记录生成器的整个状态(通过get),并在必要时在流之间进行交换(通过put).请注意,尽管如此,我尚未真正测试过您的代码.

Your proposed solution looks to me like it should work - you are recording the entire state of the generator (via get), and swapping between streams when necessary (via put). Note that I haven't actually tested your code, though.

之所以出现这个答案,是因为先前的答案(现在已删除)仅保存了状态数组的第一个元素,并使用它来设置整个状态数组.看起来像这样:

This answer came about because a previous answer (now deleted) saved only the first element of the state array, and was using it to set the entire state array. It looked something like:

integer :: seed_scalar, state_array(12)
...
! -- Generate a random number from a thread that is stored as seed_scalar
state_array(:) = seed_scalar
call random_seed( put=state_array )
call random_number( ran )
call random_seed( get=state_array )
seed_scalar = state_array(1)
! -- discard state_array, store seed_scalar

该答案旨在证明,对于某些编译器(gfortran 4.8和8.1,pgfortran 15.10)而言,这种仅通过标量维护单独线程的方法会导致不良的RNG行为.

This answer is intended to demonstrate that, for some compilers (gfortran 4.8 and 8.1, pgfortran 15.10), this method of maintaining separate threads via only a scalar results in bad RNG behavior.

请考虑以下代码,以从根本上测试随机数生成器.生成了许多随机数(在此示例中为100M),并且通过两种方式监视性能.首先,记录随机数增加或减少的时间.其次,生成条宽为0.01的直方图. (这显然是一个原始的测试用例,但事实证明足以证明这一点.)最后,还生成了所有概率的估计单西格玛标准差.这使我们能够确定何时变异是随机的或具有统计意义的.

Consider the following code to primitively test a random number generator. Many random numbers are generated - 100M for this example - and performance is monitored in two ways. First, counts for when the random number increases or decreases are recorded. Second, a histogram with bin width 0.01 is generated. (This is obviously a primitive test case, but it turns out to be sufficient to prove the point.). Finally, the estimated one-sigma standard deviation for all probabilities is also generated. This allows us to determine when variations are random or statistically significant.

program main
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)
   integer :: N_iterations, state_size, N_histogram
   integer :: count_incr, count_decr, i, loc, fid
   integer, allocatable, dimension(:) :: state1, histogram
   real(wp) :: ran, oldran, p, dp, re, rt

   ! -- Some parameters
   N_iterations = 100000000
   N_histogram = 100

   call random_seed(size = state_size)
   allocate(state1(state_size), histogram(N_histogram))
   write(*,'(a,i3)') '-- State size is: ', state_size

   ! -- Initialize RNG, taken as today's date (also tested with other initial seeds)
   state1(:) = 20180815
   call random_seed(put=state1)

   count_incr = 0
   count_decr = 0
   histogram = 0
   ran = 0.5_wp      ! -- To yield proprer oldran

   ! -- Loop to grab a batch of random numbers
   do i=1,N_iterations
      oldran = ran

      ! -- This preprocessor block modifies the RNG state in a manner
      ! -- consistent with storing only the first value of it
#ifdef ALTER_STATE
      call random_seed(get=state1)
      state1(:) = state1(1)
      call random_seed(put=state1)
#endif

      ! -- Get the random number
      call random_number(ran)

      ! -- Process Histogram
      loc = CEILING(ran*N_histogram)
      histogram(loc) = histogram(loc) + 1

      if (oldran<ran) then
         count_incr = count_incr + 1
      elseif (oldran>ran) then
         count_decr = count_decr + 1
      else
         ! -- This should be statistically impossible
         write(*,*) '** Error, same value?', ran, oldran
         stop 1
      endif
   enddo

   ! -- For this processing, we have:
   !     re    Number of times this event occurred
   !     rt    Total number
   ! -- Probability of event is just re/rt
   ! -- One-sigma standard deviation is sqrt( re*(rt-re)/rt**3 )

   ! -- Write histogram
   ! -- For each bin, compute probability and uncertainty in that probability
   open(newunit=fid, file='histogram.dat', action='write', status='replace')
   write(fid,'(a)') '# i, P(i), dP(i)'

   rt = real(N_iterations,wp)
   do i=1,N_histogram
      re = real(histogram(i),wp)
      p = re/rt
      dp = sqrt(re*(rt-1)/rt**3)

      write(fid,'(i4,2es26.18)') i, p, dp
   enddo
   close(fid)

   ! -- Probability of increase and decrease
   re = real(count_incr,wp)
   p = re/rt
   dp = sqrt(re*(rt-1)/rt**3)
   write(*,'(a,f20.15)') 'Probability of increasing: ', p
   write(*,'(a,f20.15)') '      one-sigma deviation: ', dp

   re = real(count_decr,wp)
   p = re/rt
   dp = sqrt(re*(rt-1)/rt**3)
   write(*,'(a,f20.15)') 'Probability of decreasing: ', p
   write(*,'(a,f20.15)') '      one-sigma deviation: ', dp

   write(*,'(a)') 'complete'

end program main

不做修改

没有预处理器指令ALTER_STATE,我们将按预期使用gfortran的内置PRNG,并且结果符合预期:

Without Modification

Without the preprocessor directive ALTER_STATE, we are using gfortran's built-in PRNG as intended, and the results are as expected:

enet-mach5% gfortran --version
GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

enet-mach5% gfortran -cpp -fcheck=all main.f90 && time ./a.out
-- State size is:  12
Probability of increasing:    0.499970710000000
      one-sigma deviation:    0.000070708606619
Probability of decreasing:    0.500029290000000
      one-sigma deviation:    0.000070712748851
complete

real    0m2.414s
user    0m2.408s
sys     0m0.004s

增加/减少的预期概率为0.5,并且两者都具有估计的不确定性(0.49997从0.5小于0.00007).带有误差线的直方图是

The expected probability for increasing/decreasing is 0.5, and both are with the estimated uncertainty (0.49997 is less than 0.00007 from 0.5). The histogram, plotted with error bars, is

对于每个面元,与预期概率(0.01)的差异很小,并且通常在估计的不确定性之内.因为我们生成了许多数字,所以所有变化都很小(0.1%的量级).基本上,此测试未发现任何可疑行为.

For each bin, the variation from the expected probability (0.01) is small, and typically within the estimated uncertainty. Because we have generated many numbers, all variations are small (order 0.1%). Basically, this test hasn't found any suspicious behavior.

如果启用ALTER_STATE中的块,则每次生成数字时,我们都会修改随机数生成器的内部状态.这旨在模拟现在已删除的解决方案,该解决方案仅存储状态的第一个值.结果是:

If we enable the block inside ALTER_STATE, we are modifying the internal state of the random number generator each time a number is generated. This is intended to mimic a now-deleted solution, which stored only the first value of the state. The results are:

enet-mach5% gfortran -cpp -DALTER_STATE -fcheck=all main.f90 && time ./a.out
-- State size is:  12
Probability of increasing:    0.501831930000000
      one-sigma deviation:    0.000070840096343
Probability of decreasing:    0.498168070000000
      one-sigma deviation:    0.000070581021884
complete

real    0m16.489s
user    0m16.492s
sys     0m0.000s

观察到的增长可能性远超出预期的变化(26 sigma!).这已经表明出了点问题.直方图是:

The observed probability of increasing is far outside the expected variation (26 sigma!). This already indicates something is wrong. The histogram is:

请注意,y的范围已发生很大变化.在这里,我们的变化比以前的情况大了大约两个数量级,远远超出了预期的变化.由于y范围大得多,因此在这里很难看到误差线.如果我的随机数生成器的性能是这样的,那么我将它用在任何事情上都不会感到舒服,甚至没有掷硬币的感觉.

Note that the range in y has changed substantially. Here, we have a variation of approximately two orders of magnitude larger than the previous case, far outside of the expected variation. The error bars are hard to see, here, because the y-range is so much larger. If my random number generator performed like this, I wouldn't feel comfortable using it for anything, not even a coin flip.

random_seedputget选项访问随机数生成器的处理器相关内部状态.它通常比单个数具有更多的熵,并且其形式取决于处理器.不能保证第一个数字完全代表整个州.

The put and get options of random_seed access the processor-dependent internal state of the random number generator. This typically has more entropy than a single number, and its form is processor-dependent. There's no guarantee that the first number is at all representative of the entire state.

如果您要初始化一次随机种子并生成多次,则使用单个标量就可以了.但是,如果您打算使用该状态来生成每个单一数字,那么显然有必要存储多个单一数字.

If you're initializing a random seed once, and generating many times, using a single scalar is fine. However, it's clearly necessary to store more than a single number if you intend to use that state to generate every single number.

坦率地说,我对此原始测试能够证明不良行为感到有些惊讶. RNG的有效性是一个复杂的主题,我绝不是专家.结果还取决于编译器:

I'm frankly a little bit surprised this primitive test was able to demonstrate the bad behavior. Validity of RNG is a complex subject, and I am by no means an expert. The results were also compiler-dependent:

  • 显示的结果和直方图是针对gfortran 4.8(状态大小为12)的.
  • Intel 16.0使用的状态大小为2,该测试未显示任何不良行为.
  • gfortran 8.1.0的状态大小为33,PGI 15.10的状态大小为34.它们的行为是相同的,并且是最坏的.将整个状态设置为单个值时,随后的随机数总是 相同.从单个种子初始化时,这些生成器需要大约30代的启动"才能使数字看起来合理.在状态中仅保存一个种子时,这种启动不会发生.
  • The results and histograms shown are for gfortran 4.8, which has a state size of 12.
  • Intel 16.0 uses a state size of 2, and this test didn't show any undesirable behavior.
  • gfortran 8.1.0 has a state size of 33, and PGI 15.10 has a state size of 34. Their behavior was the same, and the worst yet. When setting the entire state to a single value, subsequent random numbers are always identical. These generators require a 'priming' of around 30 generations before the numbers start looking reasonable, when initialized from a a single seed. When saving only a single seed in the state, this priming never occurs.

仅保存单个值时,较大的状态大小可能导致更多的熵损失,因此它与较差的行为相关.这当然符合我的观察.但是,如果不了解每个生成器的内部,就无法分辨.

It's possible that a larger state size results in more entropy loss when saving only a single value, so it's correlated with poorer behavior. That certainly fits with my observations. Without knowing the internals of each generator, though, it's impossible to tell.