Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

Vectorizing TBB parallel_for block

$
0
0

This article demonstrates on how to write vector friendly code inside TBB parallel_for block. Consider the below code snippet:


	$ cat test1.cc

	#include <iostream>

	#include <tbb/tbb.h>

	#include <tbb/parallel_for.h>

	#include <cstdlib>

	using namespace std;

	using namespace tbb;

	long len = 0;

	float *a;

	float *b;

	float *c;

	class Test {

	public:

	    void operator()( const blocked_range<size_t>& x ) const {

	        for (long i=x.begin(); i!=x.end(); ++i ) {

	            c[i] = (a[i] * b[i]) + b[i];

	        }

	    }

	};

	int main(int argc, char* argv[]) {

	    cout << atol(argv[1]) << endl;

	   len = atol(argv[1]);

	    a = new float[len];

	    b = new float[len];

	    c = new float[len];

	    parallel_for(blocked_range<size_t>(0,len, 100), Test() );

	    return 0;

	}

	


The above code has a parallel_for block which calls Test() functor. When this program is compiled, the vectorization report states the Loop was not vectorized as shown below:


	$ icpc -S -O3 -vec-report2 test1.cc -o test1_O3_icc.s

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

	partitioner.h(158): (col. 9) remark: loop was not vectorized: existence of vector dependence

	

Studying the loop closely, it is clear that the compiler is unable to figure out if the loop is a countable loop since the bounds of the loop are essentially function calls (x.begin()/x.end()). Modifying the code as shown below (in red font) will avoid this confusion for the compiler:

From:


	class Test {

	public:

	    void operator()( const blocked_range<size_t>& x ) const {

	        for (long i=x.begin(); i!=x.end(); ++i ) {

	            c[i] = (a[i] * b[i]) + b[i];

	        }

	    }

	};

	

To:


	class Test {

	public:

	    void operator()( const blocked_range<size_t>& x ) const {

	        long j = x.begin();

	        long k = x.end();

	        for (long i=j; i!=k; ++i ) {

	            c[i] = (a[i] * b[i]) + b[i];

	        }

	    }

	};

	

The vectorization report for the above change is:


	$ icpc -S -O3 -vec-report2 test1.cc -o test1_O3_icc.s

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: existence of vector dependence

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: existence of vector dependence

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

	partitioner.h(158): (col. 9) remark: loop was not vectorized: existence of vector dependence

	

Still the loop was not vectorized but this time because the compiler assumes that there is vector dependence. This is because compiler has clue if the arrays “a”, “b” and “c” are aliased (do they point to overlapping memory locations). Since in this case the arrays are disjoint in memory, declaring them as restrict pointers helps. The __restrict__ keyword is explicitly inform the compiler that there is no aliasing. Below the code change:

From:


	float *a;

	float *b;

	float *c;

	

To:

float * __restrict__ a;

	float * __restrict__ b;

	float * __restrict__ c;

	

Compiling this modified code will vectorize the loop as shown below:


	$ icpc -S -O3 -vec-report2 test1.cc -o test1_O3_icc.s

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: LOOP WAS VECTORIZED

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: unsupported loop structure

	parallel_for.h(108): (col. 22) remark: LOOP WAS VECTORIZED

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

	parallel_for.h(108): (col. 22) remark: loop was not vectorized: nonstandard loop is not a vectorization candidate

	partitioner.h(158): (col. 9) remark: loop was not vectorized: existence of vector dependence

 

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • Unix*
  • C/C++
  • Intel® C++ Compiler
  • Módulos Intel® de subprocesamiento
  • Dirección URL
  • Ejemplo de código
  • Temas de compilador
  • Mejora del rendimiento

  • Viewing all articles
    Browse latest Browse all 1616

    Trending Articles



    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>