Compiler Methodology for Intel® MIC Architecture
This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology. This article addresses: parallelization.
This methodology enables you to determine your application's suitability for performance gains using Intel® Many Integrated Core Architecture (Intel® MIC Architecture). The following links will allow you to understand the programming environment and help you evaluate the suitability of your app to the Intel Xeon and MIC environment.
Because of the rich and varied programming environments provided by the Intel Xeon and Xeon Phi processors, the Intel compilers offer a wide variety of switches and options for controlling the executable code that they produce. This chapter provides the information necessary to insure that a user gets the maximum benefit from the compilers.
The Intel® MIC Architecture provides two principal programming models: the native model covers compiling applications to run directly on the coprocessor, the heterogeneous offload model covers running a main host program and offloading work to the coprocessor, including standard offload and the Cilk_Offload model. The following chapter gives you insights into the applicability of these models to your application.
The third level of parallelism associated with code modernization is vectorization and SIMD instructions. The Intel compilers recognize a broad array of vector constructs and are capable of enabling significant performance boosts for both scalar and vector code. The following chapter provides detailed information on ways to maximize your vector performance.
Vectorization for C or C++ Users with Intel® Cilk™ Plus Array Notations and Elemental Functions
Explicit Vector Programming in Fortran New 05/2014!
Vectorization and Optimization Reports
How to correlate vec-report line-numbers with source line numbers.
Vectorization Diagnostics - remarks that describe vectorizer behavior
Getting the Most out of your Intel® Compiler 15.0 with the New Optimization Reports New 10/2014!
Intel Compiler 15.0 New Optimization Reports (PDF | Video) New 08/2014!
Outer Loop Vectorization via Intel® Cilk™ Plus Array Notations (for C/C++ Users)
Tradeoffs between array-notation long-vector and short-vector coding (for C/C++ Users)
Utilizing Full Vectors and Use of Option -qopt-assume-safe-padding
The final chapter in the section provides insight into some advanced optimization topics. Included are discussions of floating point accuracy, data movement, thread scheduling, and many more. This is a good chapter for users still not seeing their desired performance OR are looking for the last level of performance enhancements.
The Floating Point Model - balancing performance with accuracy and reproducibility
Prefetching on Intel® MIC Architecture Updated 01/2015!
Selective Use of gatherhint/scatterhint Instructions Updated 02/2014!
Code Generation for future Intel® MIC Architecture-based Processors New 08/2014!
Intel® Xeon Phi™ Coprocessor code named “Knights Landing” - Application Readiness New 09/2014!
The Intel® MIC Architecture provides two principal programming models: the native model covers compiling applications to run directly on the coprocessor, the heterogeneous offload model covers running a main host program and offloading work to the coprocessor, including standard offload and the Cilk_Offload model. The following chapter gives you insights into the applicability of these models to your application.
Native and Offload Programming Models
Effective Use of Compiler Features for Offloading Updated 08/2014!
OpenMP 4.0 combined offload constructs New 08/2014!
Offload support for transferring arrays of pointers New 08/2014!
Offload support for non-contiguous array slices New 08/2014!
Using the Fortran 2008 BLOCK construct with the Intel® Xeon Phi™ coprocessor New 08/2014!
Techniques to Reduce Offload-related Memory Allocation Overheads (C++, Fortran)
Taking Advantage of Offload Pointer Association and alloc/into Keywords (C++,Fortran)