This topic only applies to IA-32 architecture targeting Intel® Graphics Technology. Intel® Graphics Technology is a preview feature.
The Intel® Graphics Technology Register File (GRF) is a register file with flexible addressing modes, allowing for both direct register naming and indirect access of GRF sub-regions.
During compile time, the compiler tries to allocate variables and arrays with automatic storage, within a function or block scope, on the GRF when possible.
The following conditions must be true to enable GRF allocation of a variable, including arrays:
The size of the variable is less than 3K and is known at compile time.
Pointers to the variable do not escape to other functions that are not inlined by the compiler, although you can pass pointers to such local arrays to other functions, if the compiler inlines those functions to benefit from GRF performance.
Pointers to different variables do not merge into a single pointer variable.
If any of these conditions are not true, the variable or array is allocated in the stack memory area.
GRF access is very efficient because of its low access latency and short instruction sequences. GRF-allocated arrays may be particularly useful for caching uniform data. But consider the following:
The most efficient code results from fully unrolling loops containing references to GRF arrays, so that all references to the arrays are known at compile time. The resulting code contains only named registers and no indirect access to the GRF. The target compiler might unroll some of the loops, but in most cases you need to apply
#pragma unroll(N)
to explicitly request unrolling of a loop.Due to various hardware restrictions, the compiler may fail to vectorize indirect accesses to GRF. Often, both the simplest and the highest performance solution would be to ensure that loops referring to GRF arrays are unrolled to avoid indirect accesses.
Performance penalties can occur because of unaligned vector accesses, so try to make all vector accesses to GRF arrays 32-byte-aligned. The compiler tries to follow this recommendation during vectorization, but you also need to consider this possibility, especially when operating on short arrays. For example, for an Array Notation section such as
intArr[i : VL]
, it is recommended to ensure thati
is divisible by 8 (4-byte elements).Allocating too much memory for arrays and other automatic variables on GRF may lead to register pressure exceeding the GRF size, which is 4KB per thread, at some code points. Although the JIT compiler from the Intel® Graphics Technology driver supports spilling, spilling impacts performance adversely, and also might exceed the limit for the spill memory area imposed by the JIT compiler and the driver. If something is spilled, the JIT compiler emits a warning, and if the limit for the spill area is exceeded, a run time error is generated. Look for these warnings and errors when running your application. Consider the GRF size when selecting local array sizes and vector lengths.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 |