When I compile the following small program with -O2 or -O3 and -parallel and run it with OMP_NUM_THREADS>1 it crashes with signal 11 either immedeately (-O2) or when trying to print out res (-O3). Running the parallel binary with OMP_NUM_THREADS=1 does work. When the last printf statement is commented out, the program will run with more than one thread if compiled with -O3 -parallel.
I first observed this with Compiler XE for applications running on Intel(R) 64, Version 13.1.2.183 Build 20130514, older versions starting from 10.1 show the same or similar problems, C++ Compiler for Intel(R) EM64T-based applications, Version 9.1 Build 20060925 is the last version which does produce a working parallel binary. All Compilers are run under Linux SLES11SP3.
#include <stdio.h>
#define ARDIM 2000
int main (int argc, char **argv) {
double a[ARDIM][ARDIM], b[ARDIM][ARDIM], c[ARDIM][ARDIM];
double di=0.0,dj=0.0,res=0.0 ;
int i,j,k;
for (i=0;i<ARDIM;i++) {
di+=1.0e0;
for (j=0;j<ARDIM;j++) {
dj+=1.0e0;
a[i][j]=di/dj;
b[i][j]=dj/di;
c[i][j]=0.0 ;
}
dj=0.0;
}
for (i=0;i<ARDIM;i++) {
for (k=0;k<ARDIM;k++) {
for (j=0;j<ARDIM;j++) {
c[i][j]+=a[i][k]*b[k][j];
}
}
}
for (i=0;i<ARDIM;i++) {
for (j=0;j<ARDIM;j++) {
res+=c[i][j];
}
}
printf("\n c[1][2] = %f\n",c[1][2]);
/* printf("\n res = %f\n",res); */
}