更新时间:2023-09-29 11:08:52
Floating-point operations will yield different results on different architectures, regardless of whether they support IEEE754 or not, since floating-point is not associative. Even switching compiler on x86 will typically give different results. This whitepaper gives some excellent explanations.
Having said that, your real issue is that you have a data dependent algorithm where the operations are dependent on the random numbers you generate. So if you generate the same numbers on the CPU and the GPU then both runs will be following the same paths. Consider using cuRAND, which can generate the same numbers on both the CPU and GPU.