Compiler Methodology for Intel® MIC Architecture
Memory Allocation and First-Touch
Memory allocation is expensive on the coprocessor compared to Xeon - so it is prudent to reuse already-allocated memory wherever possible. For example, if a function gets called repeatedly (say inside a loop), and this function uses an array for temporary storage, try to allocate the array (of maximum size needed) the first time and reuse that array in later calls:
static real *temp_array=0;void foo(..) {...if (temp_array == 0) {
temp_array = my_malloc(MAX_SIZE);
}... // use of temp_array}
Also, keep in mind that the physical memory allocation on Linux happens at the first touch (and not at the malloc-point). So, if you have a loop that traverses a previously malloced (but untouched) array, the first iteration may take a longer time than the rest.