Compiler Methodology for Intel® MIC Architecture
Getting Started with Intel® Composer XE 2013, New User Compiler Basics
Overview
Modern compilers can be invoked with hundreds of options. From these, what are the essential set of options needed by the typical application programmer? This chapter has links to presentations to help the user wade through the sea of options and documentation and to focus on the key compiler options needed by most application programmers.
Topics
The following presentation, "Using the Intel Compiler" walks through the essential optimization options, inter-procedural optimization, vectorization, and auto-parallelization. This presentation has a companion set of labs, "Quicklabs", that follow the presentation.
First, download the "quicklabs.tgz" tar file and extract this to a working directory on your host system. The tar file creates a directory 'quicklabs' with these follow-along labs in C++ and Fortran, for Linux* or Windows*.
Next, download the presentation "Using the Intel Compiler" and step through the presentation. You may wish to have a browser window opened to the Compiler User and Reference Guide as described earlier the parent chapter. This way, as each compiler option is described, you will have access to the full documentation for that option.
download and untar the Quicklabs tarfile
download the presentation "Using the Intel Compiler"
After mastering the compiler essentials from the presentation above, consider the following presentations that will extend your mastery of the Intel compilers.
Floating Point Model: the -fp-model and related compiler options control the balance between accuracy, reproducibility, and performance. The Intel compiler optimization default is -O2 if no other -Ox level is specified explicitly. The -O2 optimization level is very aggressive compared to all other compilers you may have used. Symptoms of overly aggressive optimization include "incorrect" or inaccurate results or simply results that do match what the programmer expected. (As a note, it is NOT reasonable to expect exactly the same results to the last significant digit when moving from one architecture to another, or from one OS to another even with the same compiler, or from one compiler to another). To learn how to control the balance between performance, accuracy, and reproducibility read the presentation "Floating-point control in the Intel® compiler and libraries".
If you prefer, the White Paper that was the basis for the Floating Point Model presentation is available online: "Consistency of Floating-point Results with the Intel® Compilers".
Take Aways
At this point, you should understand the basic optimization options -O0 through -O3. You should also understand the vectorization options -x<arch> and -ax<arch> and know how to use -vec-report to generate a vectorization report and -no-vec to turn off vectorization. In addition, you should understand the concept of inter-procedural optimization within a source file with option -ip, and between source files with -ipo. You should also be familiar with the -parallel option. However, -parallel does NOT extract enough parallelism to make it useful for The Intel® Many Integrated Core Architecture (Intel® MIC Architecture). For MIC, you will need to use a more effective parallelization method such as OpenMP*, TBB, Intel® Cilk™ Plus, MKL, or manual threading. These parallelization methods are explained in more detail in subsequent chapters.
NEXT STEPS
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™ coprocessor. The paths provided in this guide reflect the steps necessary to get best possible application performance.
Go back to chapter "Getting Started with the Intel® Composer XE 2013"