Architecture: x86_64 (ivybridge with 8 cores)
Compiler Version: icc 15.0
How does the compiler calculate estimated potential speedup for a loop in vector report? How to find cache sizes? And How does aligned and unaligned access affect potential speedup value? can you please explain with regarding to report below?
for(index=0;index<SIZE;index++) { array_A[index]=array_B[index]+array_C[index]; }
LOOP BEGIN at vector1.c(14,2) remark #15388: vectorization support: reference array_A has aligned access [ vector1.c(16,3) ] remark #15388: vectorization support: reference array_B has aligned access [ vector1.c(16,3) ] remark #15388: vectorization support: reference array_C has aligned access [ vector1.c(16,3) ] remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector loop cost summary --- remark #15476: scalar loop cost: 6 remark #15477: vector loop cost: 5.000 remark #15478: estimated potential speedup: 4.800 remark #15479: lightweight vector operations: 5 remark #15488: --- end vector loop cost summary --- LOOP END
Thanks in advance