When I ran this code. i realize that the sum of vector of cpu method and reduceAtomicGlobal method are difference from another method if N > 16'777'217.
And of course the sum of vector is wrong.
Can you help me point out why It hapen.
You can see my console log.
The Expected value is not 42e9 because I fill each element in vector vals is 1.0f
Expected value: 1e+09
BFD: /lib/x86_64-linux-gnu/libutil.so.1: unknown type [0x13] section `.relr.dyn'
==== CPU Reduction ====
Computed CPU value: 1.67772e+07
==== GPU Reductions ====
Atomic Global 1755.09ms 1.67772e+07
Atomic Shared 1497.08ms 1e+09
Reduce Shared 780.611ms 1e+09
Reduce Shuffle 670.465ms 1e+09
Reduce Final 409.837ms 1e+09