After some bad experience a while ago when trying to upgrade to icc 13, I've been trying version XE 2015. But again, I'm seeing a loss in performance. This time I've been looking into it with VTune, and what I see doesn't really make sense.
First, IPP. The new IPP (8.2, coming from 6.1) seems to be almost twice as fast, which gives a considerable boost in performance. Great!
Then my code. Most of the functions in it have roughly the same performance with Intel compiler 10.1 and XE2015. There are a few where I see a big improvement (upto a factor 4), but also a few where I see a big degradation, and unfortunately that happens in what already was one of the heaviest functions in my code.
Now I have been trying to optimize this really heavy function, and I managed to get rid of a number of memory accesses and reduce the code by a few instructions. With these optimizations, the new code is several instructions smaller than what the 10.1 compiler generated. The number of memory accesses is down from 11 to 7. Still, I'm seeing a factor 2.5 reduction in speed.
There's something else. I have 2 nearly-identical versions of this function in my code. One version does the same thing as the one that I've profiled, except for a single step, which is also visible in the assembly code where a single memory write is gone - except for that it's identical. Now, that function - again, identical except that one instruction is removed - is also slow in the old compiler. (?)
Can there be anything that I'm missing in the conversion from 10.1 to XE 2015? Something like "Flush denormals to 0" (which is enabled). Looking at the assembly code it should really be faster, but for some reason it's not.