The User and Reference Guide for the Intel C++ Compiler 15.0 has incomplete pseudocode for the AVX2 intrinsics _mm256_shuffle_epi8:
https://software.intel.com/en-us/node/524017
for (i = 0; i < 16; i++){ if (b[i] & 0x80){ r[i] = 0; } else { r[i] = a[b[i] & 0x0F]; } }
However, this sets only the lower half of the 256-bit vector. From the description of the corresponding 256-bit VPSHUFB instruction in the Intel 64 and IA-32 Architectures Software Developer's Manual, it appears that one way of expressing pseudocode that sets the upper half of the vector is:
for (i = 0; i < 16; i++){ if (b[i] & 0x80){ r[i] = 0; } else { r[i] = a[b[i] & 0x0F]; } if (b[16+i] & 0x80){ r[16+i] = 0; } else { r[16+i] = a[16+(b[16+i] & 0x0F)]; }
or more succinctly:
for (i = 0; i < 16; i++){ r[i] = (b[i] & 0x80) ? 0 : a[b[i] & 0x0F]; r[16+i] = (b[16+i] & 0x80) ? 0 : a[16+(b[16+i] & 0x0F)]; }