< Overview >
In this article, we are enabling and using Intel(R) Integrated Performance Primitives(IPP), Intel(R) Threading Building Blocks(TBB) and Intel(R) C++ Compiler(ICC) on Linux ( Ubuntu 14.04 LTS 64bit ). We will build and run one of the examples that comes with IPP and apply TBB and ICC on the example to observe the performance improvement of using Intel(R) System Studio features.
Intel(R) System Studio (ISS) used for this article is Intel(R) System Studio 2015 Update 2 Ultimate Edition for Linux Host. The components contained in the tool suite are the following
- Intel(R) Integrated Performance Primitives 8.2 Update 1
- Intel(R) Threading Building Blocks 4.3 Update 3
- Intel(R) C++ Compiler 15.0 Update 2
This example was tested on i5 dual core platform.
< Building the IPP example with TBB libraries and ICC >
STEP 1. Setup the environment variables for IPP, TBB and ICC
We need to setup environment variables for IPP,TBB and ICC to work appropriately. Use the following 3 commands in the command line then the variables will be set. It is needed to input the right target architecture when you execute them. ex) 'ia32'IA-32 target and 'intel64'for Intel(R)64 target. Additionally, for ICC, you also need to insert a platform type. ex) 'linux' for linux target and 'android' for android target. Finally, do not forget to type a dot and a space at the beginning wich is '. '
- . /opt/intel/system_studio_2015.x.xxx/ipp/bin/ippvars.sh <arch type>
- . /opt/intel/system_studio_2015.x.xxx/tbb/bin/tbbvars.sh <arch type>
- . /opt/intel/system_studio_2015.x.xxx/bin/iccvars.sh -arch <arch type> -platform <platform type>
To verify if the above commands were executed correctly, type 'printenv' and check if 'IPPROOT' and 'TBBROOT' are listed and indicating IPP and TBB install directories, and 'PATH'is indicating'/opt/intel/system_studio_2015.x.xxx/bin/<arch type>'. For the future usage, it is recommended to write a bash script to enable multiple features of ISS.
STEP 2. Find the example
First, we will go find the IPP example and prepare to build with additional ISS features applied such as TBB and ICC.
When you install ISS 2015 with default setting, its installation directory is the following
/opt/intel/system_studio_2015.x.xxx
and the IPP example archive file is located at
/opt/intel/system_studio_2015.x.xxx/ipp/examples
you will find 'ipp-examples.tgz' in the location. Extract the examples where you like, and find 'ipp_resize_mt'example folder. That is the example we are using here. You can find additional document when you extract the examples at '<Extracted Eamples>/documentation/ipp-examples.html'.
STEP3. Build the example
If you want to build the example without TBB and ICC, just try 'make' at '<Extracted Eamples>/ipp_resize_mt' and save the binaryfor the future usage. Since IPP environment setup has been done already, the example should build without any problem.
Now we need to make some changes to build the example with TBB and ICC features enabled. Go to '<Extracted Eamples>/ipp_resize_mt'example folder and type 'gedit Makefile'. You would see the following
In case you would need the original Makefile, let's 'cp Makefile ./Makefile.bak'.
We need to change the compiler 'gcc' into 'icc' and 'g++'into 'icpc'. Also, usually 'CC' and 'CXX' are already defined so changing 'gcc' into 'icc' and 'g++'into 'icpc' sometimes do not affect anything. In that case, comment out 'ifndef CC' , 'endif', 'ifndef CXX'and'endif'. As a result, we would have
Now we give additional library information to make TBB work. For 'LIBS', we add '-ltbb' and '-ltbbmalloc'next to the ipp libraries so we have
In addition, we need to work on one more Makefile which is located at '<Extracted Eamples>/common'. There are several files that are required commonly by the multiple IPP examples.'ipp_resize_mt'also uses this. Open up the Makefile and change the compiler setting into 'icc' as we did above.
Come back to 'ipp_resize_mt'directory and change the source code of the example to enable TBB. Open '<Extracted Eamples>/ipp_resize_mt/ipp_resize_mt.cpp'and add '#define USE_TBB'so code lines for TBB( under #ifdef USE_TBB) get activated and work. See the following as an example.
Now run 'make' at the 'ipp_resize_mt' folder to build the example.
< Simple Performance Comparison >
The IPP example simply shows the performance of itself as how long in average it spends on resizing one image.
Refer the following as the options and arguments that can be used to execute the resize sample.
When the resize example works without TBB, resize function will be utilizing a single thread which results in not full exploitation of multi cores. The following is the result of the resize example with a command : './ipp_resize_mt -i ../../lena.bmp -r 960x540 -p 1 -T AVX2 -l 5000' . This command means 'resize ../../lena.bmp into 960x540 using linear interpolation method and AVX2 5000 times.
As we can see above, the average duration resizing a single image takes about 2.189ms in average. Given this result, we will test the same example with TBB exploiting 2 cores. If TBB has been successfully enabled, the thread option gets included in the help page.
When the resize example works with TBB, resize function will be run on 2 threads simultaneously. The following is the result of the resize example with a command : './ipp_resize_mt -i ../../lena.bmp -r 960x540 -p 1 -T AVX2 -t 2 -l 5000'
Utilizing 2 threads at the same time resulted in exploiting both two cores and the performance increased about 76%.
To verify if the example technically exploit two cores simultaneously, we can use VTune to investigate. The following picture shows the number of CPUs utilized during each execution. ( Blue = Resize example without TBB, Yellow = Resize example with TBB )
A yellow bar on 2.00 tells us that 2 CPUs had been running simultaneously about 4.4s.
VTune results also shows how threads were working for specific tasks. Extracted results of functions used for resizing are listed below.
We can see only a single thread is used to handle the resize function and it is a heavy load. If this sort of circumstance happens we should consider multi parallelizing. The following is results of the one with TBB.
As expected, 2 threads where running simultaneously for about 4.4s during the task and that increased the performance.
< Conclusion >
We saw how easily an IPP example can be built and tested with other features of ISS. It is recommended to take a close look into the IPP example to learn how to program with IPP and TBB. TBB here parallelizes for the dual core processor and increase the performance.
Talking about ICC for this example in fact, just changing compiler from GCC into ICC did not bring a big benefit in this case since IPP resize function already is optimized with SIMD instructions and the loops were parallelized by TBB. So there are not many other tasks that could be optimized by ICC in this example. If there were additional functions and loops that can be vectorized or parallelized so SIMD instructions or OpenMP or Cilk could be used with ICC, there would have been further chances to optimize the application.