Architecture: x86_64 (Haswell with 6 cores)
Compiler Version: icc 15.0
Performance degradation while compiling with autovectorization(-O2) on the code snippet below:
#define N 200000 void foo() { __declspec(align(64)) int a[N]; int i,cnt=0; for(cnt=0;cnt<1000000;cnt++) { for(i = 2; i < N; i++) { a[i] = a[i-2] + 1; } } }
Compilation method 1 with vectorization: icc -O2 <filename> -opt-report5
Result: Time taken (3m 24 sec)
Report says for loop above is getting vectorized with estimated potential speed up of about 1.2
Compilation method 2 without vectorization: icc <filename> -opt-report5 -O2 -no-vec
Result: Time taken (1m 08 sec)
Why is autovectorization degrading the performance even though estimated potential speedup is 1.2?