Dear Intel developers,
I have to do an horizontal sum of float by using SSE and adding the results on another float. I wrote this:
float x=0; float denom_arr_tmp[4]; _mm128 denom_tmp; for(.....) { //calculate denom_tmp } _mm_store_ps(denom_arr_tmp, denom_tmp); x+= denom_arr_tmp[0] + denom_arr_tmp[1] + denom_arr_tmp[2] + denom_arr_tmp[3]
I'm not sure is the best way. What is the fastest way to do float horizontal sum?
Thanks.