Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

"#pragma" ivdep not removing asumed vector dependance

$
0
0

Hi,

We have a question on the behavior of "#pragma ivdep", multi-versioning and assumed vector dependence. We have a workload (LU decomposition) that contains an assumed vector dependence. I did not want to post the whole code here, so I created a short reproducer that has the same behavior.

  const int n = 128;
  float* data = (float*) malloc(sizeof(float)*n*n);
  data[0:n*n] = 1.0f;

  for(int i = 0 ; i < n; i++) {
    for(int j = 0 ; j < n; j++) {
      //#pragma ivdep
      //#pragma vector always
      //#pragma simd
      for(int k = 0 ; k < n; k++) {
        data[i*n+k] += data[j*n+k];
      }
    }
  }

There is an assumed vector dependence here because 'n' could be smaller than the vector length, and the optimization report recognizes this and implements multi-versioning. However both versions that it creates are not vectorized. Following is the snippet from optimization report with -qopt-report=5

      LOOP BEGIN at reproducer.cc(15,7)<Multiversioned v1>
         remark #25228: Loop multiversioned for Data Dependence
         remark #15344: loop was not vectorized: vector dependence prevents vectorization
         remark #15346: vector dependence: assumed FLOW dependence between data line 16 and data line 16
         remark #15346: vector dependence: assumed ANTI dependence between data line 16 and data line 16
         remark #25438: unrolled without remainder by 2
      LOOP END

      LOOP BEGIN at reproducer.cc(15,7)
      <Multiversioned v2>
         remark #15304: loop was not vectorized: non-vectorizable loop instance from multiversioning
         remark #25438: unrolled without remainder by 2
      LOOP END

Multi-version v1 reports that this is an "assumed" dependence, which is what we had expected. Furthermore, adding "#pragma ivdep" does resolve the multi-versioning, but the loop is still unvectorized.

      LOOP BEGIN at reproducer.cc(15,7)
         remark #15344: loop was not vectorized: vector dependence prevents vectorization
         remark #15346: vector dependence: assumed FLOW dependence between data line 16 and data line 16
         remark #15346: vector dependence: assumed ANTI dependence between data line 16 and data line 16
         remark #25438: unrolled without remainder by 2
      LOOP END

Finally, we were able to vectorize this workload by forcing vectorization with "#pragma simd" (and we indeed got the correct result, along with significant speedup). For some reason "#pragma vector always" refused to vectorize this loop.

We are currently using C++ compiler v16.0.1 for Linux, and the problem also occurred with v16.0.0. But with earlier compilers the same code with multi-versioning had a vectorized and non-vectorized versions, and "#pragma ivdep" removed the assumed vector dependence.

Is this a change in behavior with the 16 compiler? If so, is the proper remedy to replace "#pragma ivdep" with "#pragma simd", or is there a different pragma for ignoring this type of assumed vector dependence?

Thanks in Advance!

Ryo


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>