Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

_mm256_shuffle_epi8 documentation has incomplete pseudocode

$
0
0

The User and Reference Guide for the Intel C++ Compiler 15.0 has incomplete pseudocode for the AVX2 intrinsics _mm256_shuffle_epi8:

https://software.intel.com/en-us/node/524017

for (i = 0; i < 16; i++){
 if (b[i] & 0x80){
  r[i] =  0;
 }
 else
 {
  r[i] = a[b[i] & 0x0F];
 }
}

However, this sets only the lower half of the 256-bit vector. From the description of the corresponding 256-bit VPSHUFB instruction in the Intel 64 and IA-32 Architectures Software Developer's Manual, it appears that one way of expressing pseudocode that sets the upper half of the vector is:

for (i = 0; i < 16; i++){
 if (b[i] & 0x80){
  r[i] =  0;
 }
 else
 {
  r[i] = a[b[i] & 0x0F];
 }
 if (b[16+i] & 0x80){
  r[16+i] =  0;
 }
 else
 {
  r[16+i] = a[16+(b[16+i] & 0x0F)];
}

or more succinctly:

for (i = 0; i < 16; i++){
  r[i] = (b[i] & 0x80) ? 0 : a[b[i] & 0x0F];
  r[16+i] = (b[16+i] & 0x80) ? 0 : a[16+(b[16+i] & 0x0F)];
}

Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>