I compile the following C function
void multIJK(double *restrict a, double *restrict b, double *restrict c, int dim){ for(int i=0; i < dim; i++) for(int j=0; j < dim; j++) for(int k=0; k < dim; k++) c[i+j*dim] += a[i+k*dim]*b[k+j*dim]; }
using the following options:
icpc -O3 -prec-div -no-ftz -restrict -Wshadow -MMD -MP -fno-inline-functions -mkl -fno-verbose-asm -S xchg-mult.cpp
The surprising thing is that icpc version 15 generates code as if YMM registers do not exist on both machines with AVX and AVX2. On the other hand the same compilation command with --mmic generates code using ZMM registers. This behavior is typical and holds across a range of programs.
I notice that avx instructions are generated with the -fast option. Can you please clarify exactly what options are needed to make the compiler generate avx and avx2 instructions?
Thanks!