Compiler Methodology for Intel® MIC Architecture
Large Page Considerations
Use THP enabled by default in the MPSS Operating System:
MPSS versions later than 2.1.4982-15 support “Transparent Huge Pages (THP)” which automatically promotes 4K pages to 2MB pages for stack and heap allocated data. This means that for static and dynamic data, 4KB pages get automatically converted by the uOS to 2MB pages if they have a contiguous data access pattern. You can find more details here: http://software.intel.com/en-us/blogs/2013/07/09/transparent-huge-pages-on-intel-xeon-phi-coprocessors
“Transparent huge pages” is a Linux kernel feature introduced in kernel version 2.6.38. The external link http://lwn.net/Articles/423584/ gives the general picture about how Linux allocates useful huge pages without starving the application as to the number of available pages.
User programs can use mmap with special arguments to allocate data directly in 2MB pages
User programs can directly allocate dynamic data in 2MB pages using the mmap system call (with special arguments) instead of malloc/new. This may be useful if the data access pattern is such that the program can still benefit from allocating data in 2MB pages even though THP may not get triggered in the uOS. The following macros show how to get 2MB pages using mmap:
#include <sys/mman.h> #define my_malloc(size) \ mmap(NULL, size, PROT_READ | PROT_WRITE, \ MAP_PRIVATE | MAP_HUGETLB | MAP_ANONYMOUS, 0, 0); #define my_free(addr,size) munmap(addr, size);
Use library solutions such as libhugetlbfs
Another alternative is to use a library such as libhugetlbfs to automatically allocate all malloc-ed data and static data in 2MB pages (Also works for Fortran) - Look at the tips in this article to use libhugetlbfs: http://software.intel.com/en-us/articles/optimizing-memory-bandwidth-on-stream-triad
Huge Pages in offload programs
In offload programs, THP automatic promotion applies to static data (defined on MIC side) or for dynamic data that is allocated inside an offload region using a malloc or new call.
For data allocated by #pragma offload for pointer variables in in/out/nocopy clauses, THP does not apply. You can use the env variable MIC_USE_2MB_BUFFERS (on the host) to set a threshold size beyond which allocation is done in 2MB pages. See article here for more details: http://software.intel.com/en-us/articles/effective-use-of-the-intel-compilers-offload-features
NEXT STEPS
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™ coprocessors. The paths provided in this guide reflect the steps necessary to get best possible application performance.
BACK to Preparing for the Intel® Many Integrated Core Architecture