OpenMP and Vectorization Problem

Hi,

I am using a simple ikj triple loop to compute a matrix multiplication. The intel compiler icpc (ICC) 14.0.2 20140120 is used.

Suppose that in the 2 following cases the number of threads is 1 (one) (No parallel for is used yet!)

1- If I use a #pragma omp parallel, the compiled code is seemed to be vectorized. That is what -vec-report6 tells me. But the running time is equal to the non-vectorized case:

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference B has aligned access

MATMUL.cc(71): (col. 4) remark: vectorization support: unroll factor set to 4

MATMUL.cc(71): (col. 4) remark: LOOP WAS VECTORIZED

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference C has aligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: reference B has unaligned access

MATMUL.cc(73): (col. 12) remark: vectorization support: unaligned access used inside loop body

MATMUL.cc(71): (col. 4) remark: REMAINDER LOOP WAS VECTORIZED

2- On the other hand, if I simply remove the #pragma omp parallel, This message is printed out by the -vec-report6:

Although it says "loop skipped: multiversioned", which I am not sure what it exactly means, the running time is roughly 6X better, which implies the proper vectorization. Using the #pragma omp simd does not change the results.

void MatMul_Par(float* A, float* B, float* C) {
 //#pragma omp parallel shared(A,B,C)
   {
     for (int i=0;i<N;i++) {
           for(int k=0;k<N;k++) {
                float temp = A[i*N+k];
                //#pragma omp simd
                for(int j=0;j<N;j++) {
                    C[i*N+j] += temp * B[k*N+j];
                 }
           }
    }
   } //parallel
}

PS: The problem does not exist when using Intel Cilk Plus, etc. It seems to be related to the parallel pragma in OpenMP.

OpenMP and Vectorization Problem

Trending Articles

Man charged with July slaying of Jovan Hopkins in Back of the Yards

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

Playboi Carti – MUSIC – SORRY 4 DA WAIT [iTunes Plus M4A + M4V]

Who's been in court? A round up of cases heard by Essex magistrates

99 God Status for Whatsapp, Facebook

The Angry Birds Movie (Tamil Dubbed)

Novel : I Love You, Stupid! 2

Casualty cut free following three-car collision in Newtown Unthank

Toughie 3495

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

Fushigi no Dungeon – Furai no Shiren 3: Karakuri Yashiki no Nemuri Hime (JPN)

Man dies and another in serious condition after A614 crash between Driffield...

Practice Sheet of Right form of verbs for HSC Students

Trio remanded on gun, other serious charges

Sarah Samis, Emil Bove III

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

የኤሌክትሪክ ሥራዎች ተቋራጭ ሰርተፊኬት ለማግኘት የሚያስፈልጉ ቅድመ ሁኔታዎች

Throw Back: Samini — Where My Baby Dey (Prod by Kaywa)

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

La Liga Font 2017/2018 (Free TTF Version)