Hey everyone,
I'm looking for help about loop vectorization. I'm trying to vectorize and optimize some loop but I don't understand what I'm doing wrong. The compiler can not vectorize because of FLOW and ANTI dependency. I thought I could remove it by doing a change in the code cf. "my attempt code" but this is not working. Can someone explain me why ?
I can not post the whole code becasue this is too large: (knowing that "i,j" are indices and "np, npa, cX, K, t, exprdt" are constants.). Basically, I make some computation with an input vector S, the output is the vector P (which is an input too ~ updating values).
The original code:
#pragma omp parallel for num_threads(nThreads) schedule(auto) for( int k(i*npa); k<(i+1)*npa; ++k ) { double CT = 0., tmp; double s_ = s[j*np+k]; double st_ = s_ * t; CT += c1 + s_ *( c2 + s_ * c3 ); if(p>3) CT += t * c4 + st_ * ( c5 + s_ * c6 ); CT *= exprdt; tmp = K > s_ ? K-s_: 0. ; if( tmp <= 1.0e-8 || tmp <= CT ) P[j*np+k] = exprdt * P[(j+1)*np+k]; else P[j*np+k] = tmp; }
My Attempt:
#pragma omp parallel num_threads(nThreads) { int iD = omp_get_thread_num(); int gd = npa / nThreads; int dg = (iD==nThreads-1? npa%nThreads:0); double *Aptr, *Bptr, *Cptr; Aptr = (double*)malloc((gd+dg)*sizeof(double)); Bptr = (double*)malloc((gd+dg)*sizeof(double)); Cptr = (double*)malloc((gd+dg)*sizeof(double)); memcpy( Cptr, s+j*np + i*npa+iD*gd, (gd+dg)*sizeof(double)); memcpy( Bptr, P + j *np+np + i * npa + iD*gd , (gd+dg) *sizeof(double)); for( int l =0; l<gd+dg; ++l ) // 671 { double CT = 0., tmp; double s_ =Cptr[l]; // 675 double st_ = s_ * t; CT += c1 + s_ *( c2 + s_ * c3 ); if(p>3) CT += t * c4 + st_ * ( c5 + s_ * c6 ); CT *= exprdt; tmp = K > s_ ? K-s_: 0. ; if( tmp <= 1.0e-8 || tmp <= CT ) Aptr[l] = exprdt * Bptr[l]; // 690 else Aptr[l] = tmp; // 692 } memcpy( P+j*np + i*npa + iD*gd, Aptr, (dg+gd) *sizeof(double)); free(Aptr); free(Bptr); free(Cptr); }
And here's the compilator report about vectorization:
(671): (col. 5) remark: loop was not vectorized: existence of vector dependence
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between line 690 and line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between line 675 and line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between line 690 and line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between line 692 and line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between line 690 and line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between line 690 and line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between line 690 and line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between line 690 and line 690
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between line 692 and line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between line 675 and line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between line 692 and line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between line 690 and line 692
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between line 675 and line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between line 692 and line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between line 675 and line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between line 690 and line 675