I started playing with the codes available from https://software.intel.com/en-us/articles/benefits-of-intel-avx-for-smal.... I got the Determinant4x4Matrices.cpp code to compile by replacing <gmmintrin.h> with <immintrin.h>. The code runs fine when compiled with g++, but drops core when compiled with icpc. I see the same behaviour under fedora 21 and SLES11sp3.
uname -a
Linux XXXXXX 4.1.13-100.fc21.x86_64 #1 SMP Tue Nov 10 13:13:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
g++ --version
g++ (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
g++ -mavx Determinant4x4Matrices.cpp
./a.out
Welcome to Determinat4x4Matrices
256-bit results matched for evaluation of a determinant
128-bit results matched for evaluation of a determinant
icpc -V
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 16.0.1.150 Build 20151021
icpc -xAVX Determinant4x4Matrices.cpp
./a.out
Welcome to Determinat4x4Matrices
256-bit results matched for evaluation of a determinant
128-bit results matched for evaluation of a determinant
Segmentation fault (core dumped)
GDB says .....
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000003a40483a18 in _int_free (have_lock=0, p=<optimized out>, av=0x3a407b7cc0 <main_arena>) at malloc.c:3990
3990 unlink(av, nextchunk, bck, fwd);
(gdb) where
#0 0x0000003a40483a18 in _int_free (have_lock=0, p=<optimized out>, av=0x3a407b7cc0 <main_arena>) at malloc.c:3990
#1 __GI___libc_free (mem=<optimized out>) at malloc.c:2951
#2 0x00000000004013ea in DeAllocateBuffers () at Determinant4x4Matrices.cpp:116
#3 0x00000000004048ff in main (argc=1, argp=0x7ffd0b400868) at Determinant4x4Matrices.cpp:627