I have reduced my problem to this test:
#include <immintrin.h> int main() { int i; float tmp[16]; for(i=0; i<16; i++){ tmp[i] = 5.0f; printf("%f ", tmp[i]); } printf("\n"); __m512 __vtmp = _mm512_set1_ps(10.0f); __mmask16 mask = 0x0040; _mm512_mask_extpackstorelo_ps(&tmp, mask, __vtmp, _MM_DOWNCONV_PS_NONE, 0); for(i=0; i<16; i++){ printf("%f ", tmp[i]); } printf("\n"); }
According to the description of the ISA manual, using the 0x0040, the first position of 'tmp' shouldn't be written. However, the output of this code is:
5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
10.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
Having a look at the assembly, for any reason, the 0x0040 is being translated to $1:
stmxcsr 64(%rsp) #4.1 c1 movl $1, %eax #11.23 c2 vprefetche0 (%rsp) #10.9 c2 orl $32832, 64(%rsp) #4.1 c6 kmov %eax, %k1 #11.23 c6 ldmxcsr 64(%rsp) #4.1 c10 vbroadcastsd .L_2il0floatpacket.1(%rip), %zmm0{%k1} #11.23 c11 xorl %ecx, %ecx #8.5 c15 movl $1084227584, %edx #10.9 c15 xorl %r12d, %r12d #8.5 c19 vpackstorelpd %zmm0, 72(%rsp){%k1} #11.23 c19 movl %edx, %ebx #11.23 c23 movq %rcx, %r15
Am I missing something?
I'm using icc (ICC) 14.0.2 20140120
Thank you