Gathers 2/4 packed double-precision floating point values from memory referenced by the given base address, dword indices and scale, and using the given double-precision FP mask values. The corresponding Intel® AVX2 instruction is VGATHERDPD
.
Syntax
extern __m128d _mm_mask_i32gather_pd(__m128d def_vals, double const * base, __m128i vindex __m128d vmask, const int scale); |
extern __m256d _mm256_mask_i32gather_pd(__m256d def_vals, double const * base, __m128i vindex __m256d vmask, const int scale); |
Arguments
def_vals | the vector of double-precision FP values copied to the destination when the corresponding element of the double-precision FP mask is '0'. |
base | the base address used to reference the loaded FP elements. |
vindex | the vector of dword indices used to reference the loaded FP elements. |
vmask | the vector of FP elements used as a vector mask; only the most significant bit of each data element is used as a mask. |
scale | 32-bit scale used to address the loaded FP elements. |
Description
The intrinsics conditionally load 2/4 packed double-precision floating-point values from memory using dword indices according to mask values and updates the destination operand.
Below is the pseudo-code for the intrinsics:
_mm_mask_i32gather_pd()
:
result[63:0] = (vmask[63]==1) ? (mem[base+vindex[31:0]*scale]) : (def_vals[63:0]); result[127:64] = (vmask[127]==1) ? (mem[base+vindex[63:32]*scale]) : (def_vals[127:64]);
_mm256_mask_i32gather_pd()
:
result[63:0] = (vmask[63]==1) ? (mem[base+vindex[31:0]*scale]) : (def_vals[63:0]); result[127:64] = (vmask[127]==1) ? (mem[base+vindex[63:32]*scale]) : (def_vals[127:64]); result[191:128] = (vmask[191]==1) ? (mem[base+vindex[95:64]*scale]) : (def_vals[191:128]); result[255:192] = (vmask[255]==1) ? (mem[base+vindex[127:96]*scale]) : (def_vals[255:192]);