The loop is simple
void loop(int n, double* a, double const* b) { #pragma ivdep for (int i = 0; i < n; ++i, ++a, ++b) *a *= *b; }
I am using intel c++ compiler and using #pragma ivdep for optimization currently. Any way to make it perform better like using multicore and vectorization together, or other techniques?