Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

OpenMP and Vectorization Problem

$
0
0
 

Hi,

I am using a simple ikj triple loop to compute a matrix multiplication. The intel compiler icpc (ICC) 14.0.2 20140120 is used.

Suppose that in the 2 following cases the number of threads is 1 (one) (No parallel for is used yet!)

1- If I use a #pragma omp parallel, the compiled code is seemed to be vectorized. That is what -vec-report6 tells me. But the running time is equal to the non-vectorized case: 

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference B has aligned access

MATMUL.cc(71): (col. 4) remark: vectorization support: unroll factor set to 4

MATMUL.cc(71): (col. 4) remark: LOOP WAS VECTORIZED

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference B has unaligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: unaligned access used inside loop body

MATMUL.cc(71): (col. 4) remark: REMAINDER LOOP WAS VECTORIZED

 

2- On the other hand, if I simply remove the #pragma omp parallel, This message is printed out by the -vec-report6:  

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference B has aligned access

MATMUL.cc(71): (col. 4) remark: vectorization support: unroll factor set to 4

MATMUL.cc(71): (col. 4) remark: LOOP WAS VECTORIZED

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference B has unaligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: unaligned access used inside loop body

MATMUL.cc(71): (col. 4) remark: REMAINDER LOOP WAS VECTORIZED

MATMUL.cc(71): (col. 4) remark: loop skipped: multiversioned

Although it says "loop skipped: multiversioned", which I am not sure what it exactly means, the running time is roughly 6X better, which implies the proper vectorization. Using the #pragma omp simd does not change the results.

void MatMul_Par(float* A, float* B, float* C) {
 //#pragma omp parallel shared(A,B,C)
   {
     for (int i=0;i<N;i++) {
           for(int k=0;k<N;k++) {
                float temp = A[i*N+k];
                //#pragma omp simd
                for(int j=0;j<N;j++) {
                    C[i*N+j] += temp * B[k*N+j];
                 }
           }
    }
   } //parallel
}

PS: The problem does not exist when using Intel Cilk Plus, etc. It seems to be related to the parallel pragma in OpenMP.


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>