Hi,
I would like to understand why the Intel compiler happens to fail to vectorize some basic loops when using std::vector. The following code is testing 3 different "arrays" : C-style array, an homebrewed vector class, and the std::vector class. I am timing a loop that just does v[i] =i for all the elements of the array. The results are the following :
fayard@speed:Desktop$ icpc -std=c++11 -Ofast vector-simd.cpp -o main fayard@speed:Desktop$ ./main std::vector: 362787 999999 HomeMade: 164704 999999 C-array: 166045 999999 fayard@speed:Desktop$ g++-4.9 -std=c++11 -Ofast vector-simd.cpp -o main fayard@speed:Desktop$ ./main std::vector: 186377 999999 HomeMade: 179809 999999 C-array: 176598 999999
A quick look at the assembly code (or even with -vec-report2) proves that the Intel Compiler does not vectorize the loop with std::vector. As you can see, gcc 4.9 has no problem doing it. I would like to understand :
- Why does my own vector class is fine for vectorization, and std::vector does not ? I would like to find some change in my class so that it prevents vectorization (in order to understand what happen with std::vector), but I can't. I've tried to move it to another file, and even put the getter in the .cpp file instead of the .h file, but careful compiling with -ipo still triggers the vectorization. Can anyone give me a hint ?
- Is the Intel Compiler the culprit or the standard library ? As far as I understand icpc uses the libc++ library from clang (I am on OSX), and g++-4.9 uses the libstdc++ library. I have tried to make icpc use the libstdc++ library and it does not vectorize, but it takes the library that is installed on OSX, not the one that I've compiled with gcc 4.9. Is there a way to make icpc use the standard library that I've compiled ?
- Can anyone find a code where my homemade class does not vectorize and the C-array does ?
Thanks for your help,
François
#include <iostream> #include <vector> #include <chrono> class MyVector { private: int n_elements; int* data; public: MyVector(int in_n_elements) { n_elements = in_n_elements; data = new int[n_elements]; } int& operator[](size_t i){ return data[i]; } }; int main (int argc, char const *argv[]) { const int n_elements {1000000}; const int n_iterations {1000}; { std::vector<int> v(n_elements); std::chrono::steady_clock::time_point timeStart, timeEnd; timeStart = std::chrono::steady_clock::now(); for(size_t i = 0; i < n_iterations; ++i) { for(size_t j = 0; j < n_elements; ++j) { v[j] = j; } } timeEnd = std::chrono::steady_clock::now(); std::cout << "std::vector:\t"<< std::chrono::duration_cast<std::chrono::microseconds>(timeEnd - timeStart).count() << std::endl; std::cout << v[n_elements-1] << std::endl; } { MyVector v(n_elements); std::chrono::steady_clock::time_point timeStart, timeEnd; timeStart = std::chrono::steady_clock::now(); for(size_t i = 0; i < n_iterations; ++i) { for(size_t j = 0; j < n_elements; ++j) { v[j] = j; } } timeEnd = std::chrono::steady_clock::now(); std::cout << "HomeMade:\t"<< std::chrono::duration_cast<std::chrono::microseconds>(timeEnd - timeStart).count() << std::endl; std::cout << v[n_elements-1] << std::endl; } { int v[n_elements]; std::chrono::steady_clock::time_point timeStart, timeEnd; timeStart = std::chrono::steady_clock::now(); for(size_t i = 0; i < n_iterations; ++i) { for(size_t j = 0; j < n_elements; ++j) { v[j] = j; } } timeEnd = std::chrono::steady_clock::now(); std::cout << "C-array:\t"<< std::chrono::duration_cast<std::chrono::microseconds>(timeEnd - timeStart).count() << std::endl; std::cout << v[n_elements-1] << std::endl; } return 0; }