- Using Intel CC 14.0 under Visual Studio 2013SP2
- atan2f()
- with AVX: 3.915 sec.
- with SSE2: 0.800 sec.
- atanf() is not affected
- with AVX: 0.475sec.
- with SSE2: 0.626 sec.
- atan2f()
- atan2() is widely used when calculating with complex numbers (to get the phase).
- Double precision seems to be affected too, but the numbers are not as clear as with single precision.
Simplified example code:
const int iterations = 100000; const int size = 2048; float* a = new float[size]; float* b = new float[size]; for (int i = 0; i < size; ++i) { a[i] = 1.1f; b[i] = 2.2f; } for (int j = 0; j < iterations; ++j) { for (int i = 0; i < size; ++i) { a[i] = atan2f(a[i], b[i]); } } for (int j = 0; j < iterations; ++j) { for (int i = 0; i < size; ++i) { a[i] = atanf(b[i]); } }
Options (simplified from real world project)
- using SSE:
/GS /Qopenmp /Qrestrict /Qansi-alias /W3 /Qdiag-disable:"4267" /Qdiag-disable:"4251" /Zc:wchar_t /Zi /O2 /Ob2 /Fd"Release\64\vc120.pdb" /fp:fast /Qstd=c++11 /Qipo /GF /GT /Zc:forScope /GR /Oi /MD /Fa"Release\64\" /EHsc /nologo /Fo"Release\64\" /Ot /Fp"Release\64\TestPlugin.pch" - using AVX:
/Qopenmp /Qrestrict /Qansi-alias /W3 /Qdiag-disable:"4267" /Qdiag-disable:"4251" /Zc:wchar_t /Zi /O2 /Ob2 /Fd"Release\64\vc120.pdb" /fp:fast /Qstd=c++11 /Qipo /GF /GT /Zc:forScope /GR /arch:AVX /Oi /MD /Fa"Release\64\" /EHsc /nologo /Fo"Release\64\" /Ot /Fp"Release\64\TestPlugin.pch"