Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

“FUNCTION WAS VECTORIZED” but it doesn't vectorize on the place of the function call

$
0
0

The present question relates to an already existing question on Stackoverflow with the difference that in this case AVX is the target ISA and that the function to be vectorized is more complex. When I use the __attribute__((vector(...))) declaration in the function definition:

__attribute__((vector(linear(a),linear(b))))
inline void foo(float* restrict a, float* restrict b) { 
   ...
   for(j=0; j<n; j++) {   
      // do something with a[j*STRIDE] and b[j*STRIDE]
   }
   for(j=n-1; j>=0; j--) {
      // do something with a[j*STRIDE] and b[j*STRIDE]
   }
}

the compiler reports the following for the function foo():

 foo.hpp(56): (col. 101) remark: FUNCTION WAS VECTORIZED
 foo.hpp(56): (col. 101) remark: FUNCTION WAS VECTORIZED

When I want to call the function with array notation or a single for loop:

int main() {
  ...
  #pragma omp parallel for
  for(k=0; k<n; k++) {
    int base = k*256*256;

    FP* __restrict a = &h_a[base];
    FP* __restrict b = &h_b[base];

    __assume_aligned(a,32);
    __assume_aligned(b,32);

    foo(&a[0:256], &b[0:256]); // line 337
    // OR for(i=0; i<n; i++) { foo(&a[i], &b[i]);
  }
}

it refuses to vectorize:

 main.c(337): (col. 3) remark: loop was not vectorized: existence of vector dependence
 main.c(337): (col. 3) remark: loop was not vectorized: existence of vector dependence
 main.c(337): (col. 3) remark: loop was not vectorized: not inner loop

The used Intel compiler flags are:

 icc -O3 -xAVX -ip -restrict -parallel -fopenmp -vec-report2 -openmp-report2

The question: If the compiler could vectorize the function foo(), why it can not use the vectorized version on the place of the function call (main.c:337)? The "remark" message suggests that the function was analysed again by the compiler, instead of simply injecting the already compiled vector code.

Note: I tried to use a for loop instead of array notation with #pragma ivdep and also #pragma simd, but non of them helped. The actual code is much larger, then it would conveniently fit in this post.


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>