1. Download R at CRAN(Comprehensive R Archive Network),CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Please use the CRAN (http://cran.r-project.org/mirrors.html)nearest to you to minimize network load.
2. Unpackage the source tar ball.
3. To build the R project with the Intel® C++ Compiler toochain(icc,icpc,xiar,xild) other than the default gcc toolchain ,please do as below steps.
$ source /opt/intel/composerxe/bin/compilervars.sh intel64
$ export CC="icc"
$ export CXX="icpc"
$ export AR="xiar"
$ export LD="xild"
4. If you want to get more performance gain with the following options than to use the default optimizations on your own hardware platform, can add this to the command line:
$ export CFLAGS="-O3 -ipo -openmp -xHost"
$ export CXXFLAGS="-O3 -ipo -openmp -xHost"
5. To use the threaded version of Intel MKL in R on a linux operating system add the following, and make sure you can see the right MKL libraries location when type the commands:
$ MKL="-lmkl_gf_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread"
$ echo $MKL
6. Configure the build environment for R programming language and environment and install the R objects to related directories.
$ sudo ./configure --with-blas="$MKL" --with-lapack
$ sudo make && sudo make install
7. Cd to the R directory and bootup the R executable file and type some basic R commands to verify its minimal program's correctness.
$ file bin/R
$ . bin/R
$ > y <- log(5)
$ > y
After the R executes the log(5) fuction and assigns the value to its internal value 'y' then it will print to the stdout accordingly ,the outcome should be the value '[1] 1.609438'.
8. Compile the R code with the Intel toolchain. Here I will use the same R wrapper code(pow_wrp.c) as in the article "http://software.intel.com/en-us/articles/extending-r-with-intel-mkl".
$ export LD_LIBRARY_PATH=/opt/intel/composerxe/lib/intel64/:./lib:./:$LD_LIBRARY_PATH
$ icc -O2 -fPIC -I/home/qiaominq/R-3.0.1/include -c pow_wrp.c -o pow_wrp.o
$ icc -shared -liomp5 -L/opt/intel/composerxe/mkl/lib/intel64 -lmkl_rt -o pow_wrp.so pow_wrp.o -L./lib -lR
Here the flag ‘-fPIC’ is needed on the Intel64 platform or we will get the following error messages when linking some ELF standalone objects into related final ELF shared objects:
"ld: pow_wrp.o: relocation R_X86_64_PC32 against undefined symbol `Rf_coerceVector' can not be used when making a shared object; recompile with -fPIC .ld: final
link failed: Bad value"
9. Measure the performance gain of the R program and the R runtime execution environment from using the Intel whole package compiler toolchain(icc,icpc,xiar,xild) and the BLAS and LAPACK functions within Intel® Math Kernel Library.
First ,use the our R test script to call mathematics functions and output the R program's execution time in the R environment which we have compiled above .
dyn.load("pow_wrp.so")mkl_pow <- function(n, x, y) .Call("mkl_vdpow", n, x, y)n <- 1000000x <- runif(n, min=2, max=10)y <- runif(n, min=-2, max=-1)start <- proc.time()z <- mkl_pow(n, x, y)end1 <- proc.time() - startend1##n <-1000000i <- nstart <- proc.time()repeat{ z[i] <- x[i]^y[i]i <- i - 1if (i==0) break() }end2 <- proc.time() - startend2
Then ,we can get our performance comparisions as below, which demonsrate about a gain of 25x from using the MKL and an additional 5X performance gain of a decrease in CPU execution time with getting R programs and R runtime framework compiled and optimized with the Intel compiler toolchain(icc,icpc,xiar,xild,libraries,etc).The test benchmark is conducted on the host of 4-Core Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz with 4GB memory of Red Hat Enterprise Linux Server release 6.3 operating system.
////// 1. Program running in the default R framework/environment
user system elapsed
3.842 0.020 2.913
////// 2. Program running in the default R framework/environment + extended R with MKL
user system elapsed
0.115 0.019 0.337
////// 3. Program running in the default R framework/environment + extended R with MKL + compiled and optimized R with Intel compiler
user system elapsed
0.019 0.008 0.015
Optimization Notice |
---|
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 |