Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

Wrong mask generation using _mm512_mask_extpackstorelo_epi32

$
0
0

Hi,

I have reduced my problem to this test:

#include <immintrin.h>

int main()
{
    int i;
    float tmp[16];

    for(i=0; i<16; i++){
        tmp[i] = 5.0f;
        printf("%f ", tmp[i]);
    }
    printf("\n");

     __m512 __vtmp = _mm512_set1_ps(10.0f);
     __mmask16 mask = 0x0040;

     _mm512_mask_extpackstorelo_ps(&tmp, mask, __vtmp, _MM_DOWNCONV_PS_NONE, 0);

    for(i=0; i<16; i++){
        printf("%f ", tmp[i]);
    }
    printf("\n");
}

According to the description of the ISA manual, using the 0x0040, the first position of 'tmp' shouldn't be written. However, the output of this code is:

5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
10.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000

Having a look at the assembly, for any reason, the 0x0040 is being translated to $1:

        stmxcsr   64(%rsp)                                      #4.1 c1
        movl      $1, %eax                                      #11.23 c2
        vprefetche0 (%rsp)                                      #10.9 c2
        orl       $32832, 64(%rsp)                              #4.1 c6
        kmov      %eax, %k1                                     #11.23 c6
        ldmxcsr   64(%rsp)                                      #4.1 c10
        vbroadcastsd .L_2il0floatpacket.1(%rip), %zmm0{%k1}     #11.23 c11
        xorl      %ecx, %ecx                                    #8.5 c15
        movl      $1084227584, %edx                             #10.9 c15
        xorl      %r12d, %r12d                                  #8.5 c19
        vpackstorelpd %zmm0, 72(%rsp){%k1}                      #11.23 c19
        movl      %edx, %ebx                                    #11.23 c23
        movq      %rcx, %r15 

Am I missing something?

I'm using icc (ICC) 14.0.2 20140120

Thank you


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>