Fluid animate is one of a class of algorithms for calculating fluid flow. There is one implementation using Intel® Cilk Plus technology. This sample uses the same serial implementation to demonstrate the use of Intel Profile Guided Optimization (PGO) without any code change. The PGO of Intel® C++ Compiler improves application performance by reorganizing code layout to reduce instruction-cache problems and branch mispredictions. With the collected application runtime information Intel C++ Compiler is able to be more selective and specific in optimizing the application. This may also help with application size introduced by aggressive inlining. You can find more in-depth explanation in this article Guide to Profile-guided Optimization of Computational Fluid Dynamics with Intel® Compiler.
This code originally written as part of the Princeton Parsec benchmark suite by Richard O. Lee and later modified by Christian Bienia and Christian Fensch.
- System Requirements
- Hardware:
- Any Intel® processor like 2nd Generation Intel Core™ i3, i5, or i7 processors and Intel Xeon® E3 or E5 processor family
- Microsoft Visual Studio 2010*, 2012*, or 2013* Professional Edition or above
- Intel® Parallel Studio XE 2015 Composer Edition for C++ Windows*
- GNU* GCC 4.5 or newer
- Intel® Parallel Studio XE 2015 Composer Edition for C++ Linux*
- OS X 10.9 or above
- Xcode* 5.0 or above
- Intel® Parallel Studio XE 2015 Composer Edition for C++ OS X OR
Intel® Integrated Native Developer Experience 2015 Build Edition for OS X (Intel® INDE 2015 Build Edition for OS X)
Performance Gain with PGO
PGO Speedup(Scalar / SIMD) | Code Size reduction | Compiler (Intel® 64) | Compiler options | System specifications |
---|---|---|---|---|
Scalar 4% over non PGO SIMD 5% over non PGO | 17% reduction with PGO | Intel® Parallel Studio XE 2015 Composer Edition for C++ Windows* | /O2 /Qipo /fp:fast /Oi /MD /EHsc /Qprof-dir "Intel-Release\" | Windows Server 2008 R2 Enterprise 3nd Gen Intel® Core™ i5-4670T CPU @ 2.30GHz + 8GB memory |
Scalar 4% over non PGO SIMD 6% over non PGO | 15% reduction with PGO | Intel® Parallel Studio XE 2015 Composer Edition for C++ OS X* | -O2 -ipo -fast -prof-dir ./intel-release | OS X 10.9.2 3nd Gen Intel® Core™ i7-2635QM CPU @ 2GHZ + 8GB memory |
Scalar 4% over non PGO SIMD 6% over non PGO | 13% reduction with PGO | Intel® Parallel Studio XE 2015 Composer Edition for C++ Linux* | -O2 -ipo -fast -prof-dir ./intel-release | Ubuntu* 12.04 3nd Gen Intel® Core™ i7-2600K CPU @ 3.40GHz + 8GB memory |
Build Instructions:
- For Microsoft Visual Studio 2010, 2012 or 2013 users:
- Open Visual Studio 2010, 2012 or 2013 and load the solution .sln fileChoose a configuration (for best performance, choose a release configuration):
- Intel-debug and Intel-release: uses Intel C++ compiler
- VSC-debug and VSC-release: uses Microsoft Visual C++* compiler
Run application to get a baseline run time for later use. Use "Intel-Release" configuration for performance testing- Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add
PERF_NUM
Build with Intel C++ compiler PGO:- In the Solution Explorer window right click on FluidAnimate project name, select "Intel Compiler XE 15.0 > Profile Guided Optimization". It brings up "Profile Guided Optimization Dialog.
- Without changing any default setting, click [Run] button. This step will start the following:
- Build the FluidAnimate with /Qprof-gen option
- Run the application to collect the profiling data
- Rebuild the application with /Qprof-use option
- At the end of this step the PGO optimized FluidAnimate binary is created at "PGOSample\Intel-Release\FluidAnimate.exe"
Run PGO optimized binary to get the performance number: press Ctrl+F5 to run the application.
- For Windows Command Line users:
- For Intel C++ Compiler:
- Open the appropriate Intel C++ compiler command prompt from Start menu
- Build and run the application normally without PGO:
build
build run -o 0
- Build with PGO:
build pgo
This command will complete the followings:- Build FluidAnimate with /Qprof-gen
- Run the application to collect profile data
- Rebuild FluidAnimate with /Qprof-use and generate PGO optimized binary "release\FuildAnimatePGO.exe"
[Optional]For Visual C++ Compiler (only linear/scalar will run):- Open the appropriate Microsoft Visual Studio command prompt from the Start menu and navigate to project folder
- To compile: Build.bat [perf_num]
[perf_num]
: collect performance numbers (will run example 5 times and average time taken)
- To run: Build.bat run
- For Linux* or OS X* users:
- From a terminal window, navigate to the project folderUsing Intel® C++ compiler:
- Set the environment:
source <icc-install-dir>/bin/compilervars.sh ia32
orintel-64
- To compile with PGO: run
buildPGO.sh
shell script". ./BuildPGO.sh"
This step completes the followings:- Generate an instrumented binary
- Run the instrucmented binary to collect profile data
- Build the application again using the profile data and generates PGO optimized binary
intel-release/FluidAnimatePGO
. ./intel-release/FluidAnimatePGO
[Optional] Using gcc (only linear/scalar will run):- To compile:
make gcc [perf_num=1]
[perf_num=1]
: collect performance numbers (will run example 5 times and average time taken)
- To run:
make run
- Set the environment:
- For OS X* users using Intel® Integrated Native Developer Experience Build Edition for OS X* (Intel® INDE Build Edition for OS X* ):
- From a terminal window, navigate to the project folderUsing Intel® C++ compiler:
- Set the environment:
source <icc-install-dir>/bin/compilervars.sh ia32
orintel-64
- To compile with PGO: run
buildPGO_inde.sh
shell script". ./BuildPGO_inde.sh"
This step completes the followings:- Generate an instrumented binary
- Run the instrucmented binary to collect profile data
- Build the application again using the profile data and generates PGO optimized binary
intel-release/FluidAnimatePGO
. ./intel-release/FluidAnimatePGO
[Optional] Using gcc (only linear/scalar will run):- To compile:
make gcc [perf_num=1]
[perf_num=1]
: collect performance numbers (will run example 5 times and average time taken)
- To run:
make run
- Set the environment: