Compiler Methodology for Intel® MIC Architecture
Advanced Optimizations
Overview
This chapter details some of the advanced compiler optimizations for performance on Intel® MIC Architecture AND most of these optimizations are also applicable to host applications. This chapter includes topics such as the floating-point model, prefetching, use of streaming-stores, etc. This is a good chapter for users still not seeing their desired performance OR are looking for the last level of performance enhancements.
Goals and Topics
Goals for this chapter are to explore a variety of advanced optimizations to determine which may be useful for your application:
The Floating Point Model - balancing performance with accuracy and reproducibility
Compiler Options for Low Precision Arithmetic Functions for MIC and Xeon
Scheduling for 1-4 Threads per core Using Compiler Options and Pragmas
- You can find more details of the impact of prefetching and non-temporal stores in the following paper:
NEXT STEPS
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™architecture. The paths provided in this guide reflect the steps necessary to get best possible application performance.
The next chapter, The Native and Offload Programming Models, presents a variety of programming models and data considerations to helpy you get the most performance out of The Intel® Many Integrated Core Architecture (Intel® MIC Architecture)