Merge sort algorithm is a comparison-based sorting algorithm. In this sample, we use top-down implementation, which recursively splits list into two halves (called sublists) until size of list is 1. Then merge these two sublists and produce a sorted list. This sample could run in serial, or in parallel with Intel® Cilk™ Plus keywords cilk_spawn and cilk_sync. For more details about merge sort algorithm and top-down implementation, please refer to http://en.wikipedia.org/wiki/Merge_sort.
- System Requirements
- Hardware:
- Any Intel processor with multiple cores
- Microsoft* Visual Studio 2010* or 2012* standard edition or above
- Intel® C++ Composer XE 2013 SP1 for Windows (visit the Release Notes for supported OSes)
- GNU* GCC* 4.5 or newer
- Intel® C++ Composer XE 2013 SP1 for Linux* (visit the Release Notes for supported OSes)
Code Change Highlights:
cilk_spawn
- serial version:
void merge_sort(int a[], int tmp_a[], int first, int last) { if (first < last) { int middle = (first + last + 1) / 2; merge_sort(a, tmp_a, first, middle - 1); merge_sort(a, tmp_a, middle,last); merge(a, tmp_a, first, middle, last); } }
cilk_spawn
version:void merge_sort(int a[], int tmp_a[], int first, int last) { if (first < last) { int middle = (first + last + 1) / 2; cilk_spawn merge_sort(a, tmp_a, first, middle - 1); merge_sort(a, tmp_a, middle,last); cilk_sync; merge(a, tmp_a, first, middle, last); } }
Performance Data:
Note: Modified Speedup shows performance speedup with respect to serial implementation.
Modified Speedup | Compiler (Intel® 64) | Compiler options | System specifications |
---|---|---|---|
cilk_spawn: 2.9x | Intel C++ Compiler 14.0 for Windows | /O2 /Qipo /MD |
Microsoft Windows 7* Enterprise Edition 3rd Generation Intel Core™ i5-4670T CPU @ 2.30GHz (4-core) 8GB memory |
cilk_spawn: 3.62x | Intel C++ Compiler 14.0 for Linux | -O2 -ipo |
Red Hat Enterprise Linux Server 6.3 2nd Generation Intel Core™ i7-2600K CPU @ 3.40GHz (4-core) 6GB memory |
Build Instructions:
- For Microsoft Visual Studio* 2010 and 2012 users:
- Open the solution
.sln
file[Optional] To collect performance numbers (will run example 5 times and take average time):- Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add
PERF_NUM
Choose a configuration (for best performance, choose a release configuration):- Intel-debug and Intel-release: uses Intel® C++ compiler
- VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
- Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add
- For Windows* Command Line users:
- Enable your particular compiler environmentfor Intel® C++ Compiler:
- open the appropriate Intel C++ compiler command prompt
- navigate to project folder
- compile with
Build.bat [perf_num]
perf_num
: collect performance numbers (will run example 5 times and take average time)- to run:
Build.bat run [help|0|1|2|3|4]
for Visual C++ Compiler (only linear/scalar will run):- open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
- navigate to project folder
- to compile:
Build.bat [perf_num]
perf_num
: collect performance numbers (will run example 5 times and take average time)
- to run:
Build.bat run
- For Linux* or OS X* users:
- set the icc environment:
source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
navigate to project folderfor Intel® C++ compiler:- to compile:
make [icpc] [perf_num=1]
perf_num=1:
collect performance numbers (will run example 5 times and take average time)
- to run:
make run [option=help|0|1|2|3|4]
for gcc (only linear/scalar will run):- compile with
make gcc [perf_num=1]
perf_num=1:
collect performance numbers (will run example 5 times and take average time)
- to run:
make run
- to compile:
Sin definir