Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

struggling to vectorize code

$
0
0

Hey everyone, 

 

I'm looking for help about loop vectorization. I'm trying to vectorize and optimize some loop but I don't understand what I'm doing wrong. The compiler can not vectorize because of FLOW and ANTI dependency. I thought I could remove it by doing a change in the code cf. "my attempt code" but this is not working. Can someone explain me why ?

I can not post the whole code becasue this is too large: (knowing that "i,j" are indices  and "np, npa, cX, K, t, exprdt" are constants.). Basically, I make some computation with an input vector S, the output is the vector P (which is an input too ~ updating values).

The original code:

			#pragma omp parallel for num_threads(nThreads) schedule(auto)
			for( int k(i*npa); k<(i+1)*npa; ++k )
			{
				double CT = 0., tmp;

				double s_ = s[j*np+k];
				double st_ = s_ * t;
				CT += c1 + s_ *( c2 + s_ * c3 );

					if(p>3)
						CT += t * c4 + st_ * ( c5 + s_ * c6 );
				CT *= exprdt;

				tmp = K > s_ ? K-s_: 0. ;

				if( tmp <= 1.0e-8 || tmp <= CT )
					P[j*np+k] = exprdt * P[(j+1)*np+k];
				else
					P[j*np+k] = tmp;
			}

My Attempt:

#pragma omp parallel num_threads(nThreads)
			{
				int iD = omp_get_thread_num();
				int gd = npa / nThreads;
				int dg = (iD==nThreads-1? npa%nThreads:0);

				double *Aptr, *Bptr, *Cptr;

				Aptr = (double*)malloc((gd+dg)*sizeof(double));
				Bptr = (double*)malloc((gd+dg)*sizeof(double));
				Cptr = (double*)malloc((gd+dg)*sizeof(double));

				memcpy( Cptr, s+j*np + i*npa+iD*gd, (gd+dg)*sizeof(double));
				memcpy( Bptr, P + j *np+np + i * npa + iD*gd , (gd+dg) *sizeof(double));

				for( int l =0; l<gd+dg; ++l )               // 671
				{
					double CT = 0., tmp;

					double s_ =Cptr[l];                          // 675
					double st_ = s_ * t;

					CT += c1 + s_ *( c2 + s_ * c3 );

					if(p>3)
						CT += t * c4 + st_ * ( c5 + s_ * c6 );

					CT *= exprdt;

					tmp = K > s_ ? K-s_: 0. ;

									if( tmp <= 1.0e-8 || tmp <= CT )
						Aptr[l] = exprdt * Bptr[l];                 // 690
					else
						Aptr[l] = tmp;                              // 692
				}

			        memcpy( P+j*np + i*npa + iD*gd, Aptr, (dg+gd) *sizeof(double));

			        free(Aptr); free(Bptr); free(Cptr);
			}

And here's the compilator report about vectorization:

(671): (col. 5) remark: loop was not vectorized: existence of vector dependence
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 690
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 690
(690): (col. 7) remark: vector dependence: assumed ANTI dependence between  line 690 and  line 692
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 692
(692): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 692 and  line 675
(675): (col. 17) remark: vector dependence: assumed ANTI dependence between  line 675 and  line 690
(690): (col. 7) remark: vector dependence: assumed FLOW dependence between  line 690 and  line 675


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>