Dear Intel developers,
I'm using Intel Intrinsics with Intel 13.1.3. I have some rounding error by using _mm_mul_ps routine. This is the piece of code:
__m128 denom_tmp = _mm_setzero_ps(); __m128 sample_tmp = sample_r1; __m128 sample_tmp2; for(j = 0; j < tot_iter; j+=UNROLL_STEP) { sample_tmp2 = _mm_mul_ps(*sample_tmp, *sample_tmp); denom_tmp = _mm_add_ps(sample_tmp2, denom_tmp); sample_tmp++; }
sample_r1 is filled before that code. My goal is simply to do a square of each elements of sample_r1.These are the first four values of sample_r1: (-570,911 -236,614 36,6958 27,5522).
After the loop above, the first four results of denom_tmp are: (325940 55986,2 1346,58 759,121).
The correct results done by hand and compared with scalar version is:( 325939,369921 55986,184996 1346,58173764, 759,12372484). So, I have an incredible rounding error. Maybe I'm using the multiplication routine in a wrong manner? Can I increase the precision in some way?
Thanks in advance.