Overview
The latest Intel Compilers (released after the 13.0.039 Beta Update 1 release) do not generate low-precision sequences unless low-precision options are added explicitly to the compiler options. This article describes methods for improving application performance through the use of low-precision mathematical functions.
Topics
The Intel Compilers are able to generate low-precision code sequences for certain operations and intrinsics, such as divide and square root. Why would a user consider LOW PRECISION? Speed and performance: low precision operations can be faster than their more high precision equivalents. The Intel compilers provide a robust set of options to control mathematical precision.
With current compilers, the compiler provides the -fimf* option. Variations of the base -fimf option are shown below. The term "ulp", if you are not familiar with this acronym is Units in the Last Place (binary, 4 ulp implies 2 last bits in the mantissa). The general syntax is:
-fimf-domain-exclusion=<n1> -fimf-accuracy-bits=<n2> -fimf-precision=low -fimf-max-error=<n3_ulps>
Some combinations that make sense:
a) -fimf-precision=low -fimf-domain-exclusion=15 (gives lowest precision sequences available for both SP/DP)
b) -fimf-domain-exclusion=15 -fimf-accuracy-bits=22 (low precision compared to default for DP)
c) -fimf-domain-exclusion=15 -fimf-accuracy-bits=11 (even lower precision for DP, low precision compared to default for SP)
d) -fimf-precision=low
e) -fimf-max-error=2048 -fimf-domain-exclusion=15 (gives lower accuracy than default max-error of 4 ulps, but higher accuracy than a above)
f) -fp-model fast=2 (Compiler default is -fp-model fast=1, specifying fast=2 is equivalent to adding the option -fimf-domain-exclusion=15 to the default)
g) -fp-model-precise –no-prec-div –no-prec-sqrt –fast-transcendentals –fimf-precision=high (to get vectorized, high precision versions of division, square root and transcendental functions from libsvml)
These options affect code generation for vector as well as scalar code.
For the full list of options and detailed descriptions, please refer to the "Floating-Point Options" in the Compiler User and Reference Guide (installed on your system along with the compiler, and available online), some excerpts are given below.
Documentation (from Compiler User Guide) here:
The domain exclusion attribute is a ‘bit vector’ that is derived from the classList specified in the command line switch. In general, each unique classList element corresponds to a power of two. The exclusion attribute is the logical or of the associated powers of two. The following table provides the current mapping from classList mnemonics to numerical values:
Value Class excluded from the domain | Corresponding integer value to be computed |
extremes | 1 |
nans | 2 |
infinities | 4 |
denormals | 8 |
zeros | 16 |
none | 0 |
all | 31 |
common | 15 |
other combinations | bitwise OR of the used values |
Simply put, these conditions can be excluded by setting the appropriate bit. Exclude means that the code generated/used by the compiler does not have to handle that category of values as specified by the IEEE standard. Use these with caution: if your application generates these values and does not handle them correctly, abnormal results or application abnormalities may result. However, if your application is well behaved, code generated with these exclusions can be simplified since it need not check and handle these end cases.
fimf-accuracy-bits, Qimf-accuracy-bits
Defines the relative error for math library function results.
None
All
Linux and OS X: | -fimf-accuracy-bits=bits[:funclist] |
Windows: | /Qimf-accuracy-bits:bits[:funclist] |
Is a positive, floating-point number indicating the number of correct bits the compiler should use. | |
funclist | Is an optional list of one or more math library functions to which the attribute should be applied. If you specify more than one function, they must be separated with commas. |
The compiler uses default heuristics when calling math library functions. |
This option defines the relative error, measured by the number of correct bits, for math library function results.
The following formula is used to convert bits into ulps: ulps = 2p-1-bits, where p is the number of the target format mantissa bits (24, 53, and 64 for single, double, and long double, respectively).
This option can improve run-time performance, but it may decrease the accuracy of results.
If option -fimf-precision (Linux* OS and OS X*) or /Qimf-precision (Windows* OS), or option -fimf-max-error (Linux* OS and OS X*) or /Qimf-max-error (Windows* OS), or option -fimf-accuracy-bits (Linux OS and OS X*) or /Qimf-accuracy-bits (Windows OS) is specified, the default value for max-error is determined by that option. If one or more of these options are specified, the default value for max-error is determined by the last one specified on the command line.
If none of these options are specified, the default value for max-error is determined by the setting specified for option-[no-]fast-transcendentals (Linux OS and OS X) or /Qfast-transcendentals[-] (Windows OS). If that option also has not been specified, the default value is determined by the setting of option -fp-model (Linux OS and OS X) or /fp (Windows OS).
fimf-precision, Qimf-precision
Defines the accuracy for math library functions.
IDE Equivalent
NoneArchitectures
All
Syntax
Linux and OS X: | -fimf-precision[=value[:funclist]] |
Windows: | /Qimf-precision[:value[:funclist]] |
Arguments
value | Is one of the following values denoting the desired accuracy:
In the above explanations, max-error means option -fimf-max-error (Linux* OS and OS X*) or /Qimf-max-error (Windows* OS); accuracy-bits means option -fimf-accuracy-bits (Linux* OS and OS X*) or /Qimf-accuracy-bits (Windows* OS). | ||||||
funclist | Is an optional list of one or more math library functions to which the attribute should be applied. If you specify more than one function, they must be separated with commas. |
Default
OFF | The compiler uses default heuristics when calling math library functions. |
Description
This option defines the accuracy (precision) for math library functions.
This option can be used to improve run-time performance if reduced accuracy is sufficient for the application, or it can be used to increase the accuracy of math library functions.
In general, using a lower precision can improve run-time performance and using a higher precision may reduce run-time performance.
If option -fimf-precision (Linux* OS and OS X*) or /Qimf-precision (Windows* OS), or option -fimf-max-error (Linux* OS and OS X*) or /Qimf-max-error (Windows* OS), or option -fimf-accuracy-bits (Linux OS and OS X*) or /Qimf-accuracy-bits (Windows OS) is specified, the default value for max-error is determined by that option. If one or more of these options are specified, the default value for max-error is determined by the last one specified on the command line.
If none of these options are specified, the default value for max-error is determined by the setting specified for option -[no-]fast-transcendentals (Linux OS and OS X) or /Qfast-transcendentals[-] (Windows OS). If that option also has not been specified, the default value is determined by the setting of option -fp-model (Linux OS and OS X) or /fp (Windows OS).
Take Aways
The Intel compilers allow a user to select lower (or higher) precision for mathematical intrinsics. This allows the user to balance the tradeoffs between performance, accuracy, and reproducibility. Options discussed in this section are in the -fimf family of options:
-fimf-precision defines the accuracy (precision) for math library functions
-fimf-accuracy bits defines the relative error for math library function results
-fimf-domain-exclusion set up a bit mask to exclude classes of numeric exceptions. Without having to check for these exception conditions, math functions are allowed to run faster.
These options are part of a much larger discussion of numerics, balancing precision against performance and reproducibility. Please read the -fp-model compiler option for broader exposure to control of accuracy. For an in-depth discussion of this topic, please read the white paper by clicking the following link "Consistency of Floating-Point Results using the Intel Compiler".
NEXT STEPS
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™Coprocessor. The paths provided in this guide reflect the steps necessary to get best possible application performance.
Back to the main chapter, Advanced Optimizations for Intel® MIC Architecture