Using a thread_local non-POD type seems to cause all static but non-thread_local variables (which should only be initialized once) to be reconstructed on every access to the thread_local variable.
Observe:
// icpc version 15.0.0 (gcc version 4.9.0 compatibility) // icpc -std=c++11 tls.cpp && ./a.out // -- Expected output -- // MakePointer() // MyInt constructor <pointer> // -- Actual output -- // MakePointer() // MyInt constructor <pointer> // MakePointer() x 10 (= numReps) #include <cstdio> // non-POD type struct MyInt { MyInt() { printf("MyInt constructor %p\n", this); } int v = 1; }; // should be called exactly once inline void *MakePointer() { printf("MakePointer()\n"); return nullptr; } // normal static initialization, should happen ONCE static void *pointer = MakePointer(); // thread_local non-POD variable, should be constructed as many times as there are threads static thread_local MyInt v1; const int numReps = 10; int main(int argc, char const *argv[]) { for (int i = 0; i < numReps; ++i) ++v1.v; // each access seems to reinitialize all static variables?!? return 0; }
We have a static variables 'pointer' which should be constructed exactly once (twice if you count zero-initialization of statics and globals) and a non-POD thread_local variables 'v1' which - as this example only used a single thread - be constructed once as well.
Unfortunately - at least using icpc version 15.0.0 (gcc version 4.9.0 compatibility) - every access to the thread_local variable 'v1' seems to cause reinitialization of 'pointer'. This is both a performance problem (depending on the cost of initialization of all statics in your translation unit) and a correctness problem (in my actual use case a static variable shared between threads got re-initialized by later threads while it was being used by earlier ones).
When I change the thread_local 'v1' to a POD-type (e.g. int), this phenomenon no longer seems to occur.