I write a simple program and build with icpc to examine the performance of AVX in my mathine. The code snippet is as following,
#define T 2000000 #define X 16 #define Y 16 #define Z 16 for(int t=0;t<T;t++) for(int k=0;k<Z;k++) for(int j=0;j<Y;j++) for(int i=0;i<X;i++) A[k][j][i]=B[k][j][i]+C[k][j][i];
The configures are as following,
icpc version 13.1.0 (gcc version 4.6.1 compatibility)
FFLAGS="-O3 -xhost "
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Red Hat Enterprise Linux Server release 6.3 (Santiago)
The exeperiment result is as following,
collapse;width:216pt" width="288">niterator 2000000 2000000 200000 size 12*12*12 16*16*16 32*32*32 time (s) serial 1.09918 2.58384 2.99971 avx 1.71405 4.01935 5.18318
As the table, AVX version always cost more time then serial version.
Can somebody know why?
Thanks in advance!!!