Greetings,
With ICC 14 not 15 when in my 3rd party library I have optimized coded for AVX-2 (I'm on Haswell) compiler errors by saying jump labels are not supported. Though, I know it's "#ifdef"'s counterpart written in C/++ will be not so fast as it were coded in inline assembly. Those inline assemblies are perfectly optimized to my Haswell (AVX-2), so I don't believe ICC can little mathematical loop optimize better as it were written in inline assembly.
Take please this as feature request, and:
1. don't forget some assemble depending on prologue are in AT&T syntax
2. don't forget some assemble depending on prologue are in Intel syntax
3. /Qipo IL generation should be easily implemented by flag "do not touch and optimize this at all, it is inline assembly!" (like GCC has attribute volatile to "asm" blocks.
TIA!