Hi,
I would like to build my own version of a std::vector that keeps its memory aligned. The following code works as expected:
template <typename T> class Vector { private: T* begin_; int size_; public: Vector(T* p, int n) : begin_{p}, size_{n} {} int size() const { return size_; } const T& operator[](int k) const { __assume_aligned(begin_, 32); return begin_[k]; } T& operator[](int k) { __assume_aligned(begin_, 32); return begin_[k]; } }; double f(const Vector<double>& v) { double sum = 0.0; for (int i = 0; i < v.size(); ++i) { sum += v[i]; } return sum; }
When compiled with
icpc -c -std=c++11 -O3 -xHost -ansi-alias -opt-report=5 f.cpp -o f.o
on OSX with icpc 15.0.2, the optimization report shows no sign of loop peeling: it proves that the __assume_aligned works as expected.
Unfortunately, with such a design, Vector<int> is not as efficient as Vector<double>: pointer aliasing prevents the compiler to optimize v.size() out of the loop. Therefore, the trip count is not known at the entrance of the loop which is therefore not vectorized. The classic solution to this problem consists in using a pointer T* end_ such that the size of the vector is end_ - begin_. Unfortunately, the following code:
template <typename T> class Vector { private: T* begin_; T* end_; public: Vector(T* p, int n) : begin_{p}, end_{p + n} {} int size() const { return static_cast<int>(end_ - begin_); } const T& operator[](int k) const { __assume_aligned(begin_, 32); return begin_[k]; } T& operator[](int k) { __assume_aligned(begin_, 32); return begin_[k]; } }; double f(const Vector<double>& v) { double sum = 0.0; for (int i = 0; i < v.size(); ++i) { sum += v[i]; } return sum; }
is not vectorized anymore. But removing the __assume_aligned fix this problem. It seems that the compiler acts as if __assume_aligned(begin_, 32) might mutate begin_. So far I have found the following workaround:
template <typename T> class Vector { private: T* begin_; T* begin_copy_; T* size_; public: Vector(T* p, int n) : begin_{p}, begin_copy_{p}, size_{p + n} {} int size() const { return static_cast<int>(size_ - begin_copy_); } const T& operator[](int k) const { __assume_aligned(begin_, 32); return begin_[k]; } T& operator[](int k) { __assume_aligned(begin_, 32); return begin_[k]; } }; double f(const Vector<double>& v) { double sum = 0.0; for (int i = 0; i < v.size(); ++i) { sum += v[i]; } return sum; }
but would be nice it such hacks could be avoided.
Also, does the fact that __assume_aligned works when placed in a getter/setter should be expected to work in the future?