I raised this issue in another forum (https://software.intel.com/en-us/forums/software-tuning-performance-opti...), but wanted to see if any of the compiler folks could explain to me why the compiler generates VGATHER-based code whenever I use an unsigned variable as an offset to an array index?
For a simple loop such as
for (int i=0; i<N; i++) { target[i] = scalar * source[i+offset]; }
the compiler will claim that the access is "indirect" if the "offset" variable is unsigned, but will generate reasonable vector code if the "offset" variable is signed. The VGATHER-based code is typically slower (~1.5x) than the corresponding scalar code and 4x or more slower than the straightforward vector code (assuming both arrays are in the L1 data cache).
Is there something about the interpretation of unsigned variables that makes the gather function necessary, or is this an idiosyncrasy of the compilers? (I have seen it with both icc 15.0.3 and with a 2016 version.)