Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

how to broadcast 4 float into 4 lanes?

$
0
0

Hi there,

After reading a large of materials, I can never fount out how to broadcast 4 float variables into 4 lanes of the vector register on MIC.

e.g. float array[4]={a,b,c,d};

how to load into a vector register like :{aaaa,bbbb,cccc,dddd} using one intrinsic.

If I use _mm512_mask_blend_ps, it takes 4 intrinsics.

__forceinline __m512 gather16float_4float(const float a, const float b, const float c, const float d)
{
        __m512 v = _mm512_set1_ps(a);
        v = _mm512_mask_blend_ps(0x00f0,v,_mm512_set1_ps(b));
        v = _mm512_mask_blend_ps(0x0f00,v,_mm512_set1_ps(c));
        v = _mm512_mask_blend_ps(0xf000,v,_mm512_set1_ps(d));
        return v;
}

Any more faster methods?

All of the intrinsics are about 128bits broadcast. Is there any intrinsics between 4 lane.

Could u please help me how to do this.

Thanks.


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>