Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

ICC vs. GCC: Strange scaling behavior of an OpenMP parallelized benchmark

$
0
0

I am currently preparing two benchmarks of a new 240-core E7-x890v2 server. On a 60-core test machine (four sockets with E7-4890v2, HyperThreading and TurboBoost enabled; RHEL 7, Transparent Huge Pages activated) I get the following timings for "Benchmark A" with ICC v14.0.2 and GCC v4.8.2:

 

==  60 cores  ===============================================================

----  GCC executable  -------------------------------------------------------

        Finished in 187.65 second(s) CPU time, 3.144 second(s) WALL time.

        Finished in 186.34 second(s) CPU time, 3.122 second(s) WALL time.

        Finished in 205.52 second(s) CPU time, 3.461 second(s) WALL time.

----  ICC executable  -------------------------------------------------------

        Finished in 819.70 second(s) CPU time, 13.649 second(s) WALL time.

        Finished in 779.00 second(s) CPU time, 12.974 second(s) WALL time.

        Finished in 822.83 second(s) CPU time, 13.703 second(s) WALL time.

==  32 cores  ===============================================================

----  GCC executable  -------------------------------------------------------

        Finished in 169.27 second(s) CPU time, 5.295 second(s) WALL time.

        Finished in 169.35 second(s) CPU time, 5.295 second(s) WALL time.

        Finished in 169.25 second(s) CPU time, 5.292 second(s) WALL time.

----  ICC executable  -------------------------------------------------------

        Finished in 369.26 second(s) CPU time, 11.529 second(s) WALL time.

        Finished in 410.28 second(s) CPU time, 12.809 second(s) WALL time.

        Finished in 343.93 second(s) CPU time, 10.739 second(s) WALL time.

==  16 cores  ===============================================================

----  GCC executable  -------------------------------------------------------

        Finished in 172.54 second(s) CPU time, 10.776 second(s) WALL time.

        Finished in 168.89 second(s) CPU time, 10.546 second(s) WALL time.

        Finished in 196.67 second(s) CPU time, 12.284 second(s) WALL time.

----  ICC executable  -------------------------------------------------------

        Finished in 216.70 second(s) CPU time, 13.529 second(s) WALL time.

        Finished in 264.84 second(s) CPU time, 16.540 second(s) WALL time.

        Finished in 214.90 second(s) CPU time, 13.419 second(s) WALL time.

==   8 cores  ===============================================================

----  GCC executable  -------------------------------------------------------

        Finished in 183.34 second(s) CPU time, 22.893 second(s) WALL time.

        Finished in 183.68 second(s) CPU time, 22.937 second(s) WALL time.

        Finished in 183.40 second(s) CPU time, 22.902 second(s) WALL time.

----  ICC executable  -------------------------------------------------------

        Finished in 177.59 second(s) CPU time, 22.176 second(s) WALL time.

        Finished in 179.41 second(s) CPU time, 22.402 second(s) WALL time.

        Finished in 179.39 second(s) CPU time, 22.401 second(s) WALL time.

==   4 cores  ===============================================================

----  GCC executable  -------------------------------------------------------

        Finished in 159.01 second(s) CPU time, 39.709 second(s) WALL time.

        Finished in 159.29 second(s) CPU time, 39.780 second(s) WALL time.

        Finished in 160.02 second(s) CPU time, 39.962 second(s) WALL time.

----  ICC executable  -------------------------------------------------------

        Finished in 171.26 second(s) CPU time, 42.769 second(s) WALL time.

        Finished in 169.51 second(s) CPU time, 42.333 second(s) WALL time.

        Finished in 170.76 second(s) CPU time, 42.642 second(s) WALL time.

==   2 cores  ===============================================================

----  GCC executable  -------------------------------------------------------

        Finished in 158.64 second(s) CPU time, 79.233 second(s) WALL time.

        Finished in 160.43 second(s) CPU time, 80.127 second(s) WALL time.

        Finished in 158.63 second(s) CPU time, 79.228 second(s) WALL time.

----  ICC executable  -------------------------------------------------------

        Finished in 168.97 second(s) CPU time, 84.384 second(s) WALL time.

        Finished in 168.77 second(s) CPU time, 84.287 second(s) WALL time.

        Finished in 168.78 second(s) CPU time, 84.291 second(s) WALL time.

==   1 core   ===============================================================

----  GCC executable  -------------------------------------------------------

        Finished in 158.25 second(s) CPU time, 158.079 second(s) WALL time.

        Finished in 160.93 second(s) CPU time, 160.756 second(s) WALL time.

        Finished in 158.13 second(s) CPU time, 157.961 second(s) WALL time.

----  ICC executable  -------------------------------------------------------

        Finished in 167.94 second(s) CPU time, 167.762 second(s) WALL time.

        Finished in 168.40 second(s) CPU time, 168.213 second(s) WALL time.

        Finished in 169.90 second(s) CPU time, 169.717 second(s) WALL time.

 

 

Both executables were compiled with optimization level '-O2'. The behavior of this benchmark application on a Itanium-9560 (HP-UX 11.31, aCC compiler) server of same size corresponds—more or less—to the GCC executable on the LINUX server.

 

I would be very grateful for any thoughts, comments, explanations, suggestions… Please, do not hesitate to ask for more information.

 

Thank you for reading.


Viewing all articles
Browse latest Browse all 1616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>