Hi,
I would like to understand why the Intel compiler happens to fail to vectorize some basic loops when using std::vector. The following code is testing 3 different "arrays" : C-style array, an homebrewed vector class, and the std::vector class. I am timing a loop that just does v[i] =i for all the elements of the array. The results are the following :
fayard@speed:Desktop$ icpc -std=c++11 -Ofast vector-simd.cpp -o main fayard@speed:Desktop$ ./main std::vector: 362787 999999 HomeMade: 164704 999999 C-array: 166045 999999 fayard@speed:Desktop$ g++-4.9 -std=c++11 -Ofast vector-simd.cpp -o main fayard@speed:Desktop$ ./main std::vector: 186377 999999 HomeMade: 179809 999999 C-array: 176598 999999
A quick look at the assembly code (or even with -vec-report2) proves that the Intel Compiler does not vectorize the loop with std::vector. As you can see, gcc 4.9 has no problem doing it. I would like to understand :
- Why does my own vector class is fine for vectorization, and std::vector does not ? I would like to find some change in my class so that it prevents vectorization (in order to understand what happen with std::vector), but I can't. I've tried to move it to another file, and even put the getter in the .cpp file instead of the .h file, but careful compiling with -ipo still triggers the vectorization. Can anyone give me a hint ?
- Is the Intel Compiler the culprit or the standard library ? As far as I understand icpc uses the libc++ library from clang (I am on OSX), and g++-4.9 uses the libstdc++ library. I have tried to make icpc use the libstdc++ library and it does not vectorize, but it takes the library that is installed on OSX, not the one that I've compiled with gcc 4.9. Is there a way to make icpc use the standard library that I've compiled ?
- Can anyone find a code where my homemade class does not vectorize and the C-array does ?
Thanks for your help,
François
#include <iostream>
#include <vector>
#include <chrono>
class MyVector {
private:
int n_elements;
int* data;
public:
MyVector(int in_n_elements) {
n_elements = in_n_elements;
data = new int[n_elements];
}
int& operator[](size_t i){
return data[i];
}
};
int main (int argc, char const *argv[])
{
const int n_elements {1000000};
const int n_iterations {1000};
{
std::vector<int> v(n_elements);
std::chrono::steady_clock::time_point timeStart, timeEnd;
timeStart = std::chrono::steady_clock::now();
for(size_t i = 0; i < n_iterations; ++i)
{
for(size_t j = 0; j < n_elements; ++j)
{
v[j] = j;
}
}
timeEnd = std::chrono::steady_clock::now();
std::cout << "std::vector:\t"<<
std::chrono::duration_cast<std::chrono::microseconds>(timeEnd -
timeStart).count() << std::endl;
std::cout << v[n_elements-1] << std::endl;
}
{
MyVector v(n_elements);
std::chrono::steady_clock::time_point timeStart, timeEnd;
timeStart = std::chrono::steady_clock::now();
for(size_t i = 0; i < n_iterations; ++i)
{
for(size_t j = 0; j < n_elements; ++j)
{
v[j] = j;
}
}
timeEnd = std::chrono::steady_clock::now();
std::cout << "HomeMade:\t"<<
std::chrono::duration_cast<std::chrono::microseconds>(timeEnd -
timeStart).count() << std::endl;
std::cout << v[n_elements-1] << std::endl;
}
{
int v[n_elements];
std::chrono::steady_clock::time_point timeStart, timeEnd;
timeStart = std::chrono::steady_clock::now();
for(size_t i = 0; i < n_iterations; ++i)
{
for(size_t j = 0; j < n_elements; ++j)
{
v[j] = j;
}
}
timeEnd = std::chrono::steady_clock::now();
std::cout << "C-array:\t"<<
std::chrono::duration_cast<std::chrono::microseconds>(timeEnd -
timeStart).count() << std::endl;
std::cout << v[n_elements-1] << std::endl;
}
return 0;
}