I have a source file with 99% SSE2/SSE3/SSE4.1 intrinsics. I've compared execution time between Intel Compiler V15 and MSVC12, with Intel the code takes ~190ms, while with MSVC(2013) it just takes 170ms, so nearly 10% slower. I assumed Intel compiler would give better or at least equal results when running on Intel CPU (here I7-4770)
If needed I can attach IACA outputs. What I see is that Intel compiler optimizes the code for latency as there are only half the delays on critical paths as for MSVC. The throughput says 359 cycles for Intel and 354 cycles for MSVC. Anyway the overall performance IMO doesn't reflect the IACA outputs.
I wonder which kind of source code is needed to give the statement that Intel compiler is best for Intel CPUs?