Dijkstra algorithm is a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. The algorithm requires repeated searching for the vertex having the smallest distance and accumulating shortest distance from the source vertex. This example calculates the shortest path between each pair of vertexes in a complete graph having 2000 vertexes using Dijkstra algorithm. It run serially, with Intel® Cilk™ Plus Array Notation (AN) for vectorization, with Intel Cilk Plus cilk_for for parallelization, and with both vectorization and cilk_for.
- System Requirements
- Hardware:
- Any Intel processor with Intel® Advanced Vector Extensions (Intel® AVX) support like 2nd Generation Intel Core™ i3, i5, or i7 processors and Intel Xeon® E3 or E5 processor family, or newer
- Microsoft* Visual Studio 2010* or 2012* standard edition or above
- Intel® C++ Composer XE 2013 SP1 for Windows* (visit the Release Notes for supported OSes)
- GNU* GCC* 4.5 or newer
- Intel® C++ Composer XE 2013 SP1 for Linux* (visit the Release Notes for supported OSes)
Code Change Highlights:
cilk_for
- linear version:
221 int i; 222 // Temporary array storing intermedia path length result to each vertex 223 unsigned int vtemp[VNUM]; 224 // Flag array: 225 // "1" means the shortest path hasn't been finished 226 // "0" menas the shortest path has been finished 227 unsigned char vflag[VNUM]; 228 229 // Main loop calculate the shortest path to all other vertexes from vertex "i" in each iteration 230 for (i = 0;i < VNUM;++i) { ... 270 }
cilk_for
version:331 cilk_for (int i = 0;i < VNUM;++i) { 332 // Declare temporary arrays inside "cilk_for" loop body to make them private 333 // Temporary array storing intermedia path length result to each vertex 334 unsigned int vtemp[VNUM]; 335 // Flag array: 336 // "1" means the shortest path hasn't been finished 337 // "0" menas the shortest path has been finished 338 unsigned char vflag[VNUM]; ... 377 }
- Array Notation
- Searching for the vertex having the smallest distance
Scalar version:
248 minval = minpos = INFINITE; 249 // Loop scan vtemp to find the index of vertex having the shortest path in vtemp and its length 250 for (k = 0; k < VNUM;k++) 251 if (vtemp[k] < minval) { 252 minpos = k; 253 minval = vtemp[k]; 254 }
Array notation version:306 minpos = __sec_reduce_min_ind(vtemp[:]); 307 minval = vtemp[minpos];
Accumulating shortest distance from the source vertexScalar version:264 for (k = 0; k < VNUM;k++) 265 if (vflag[k] && ((m_graph[minpos][k] + minval) < vtemp[k])) { 266 vtemp[k] = (m_graph[minpos][k] + minval); 267 m_pvertex[i][k] = minpos; 268 }
Array notation version:317 if (vflag[:] && ((m_graph[minpos][:] + minval) < vtemp[:])) { 318 vtemp[:] = (m_graph[minpos][:] + minval); 319 m_pvertex[i][:] = minpos; 320 }
cilk_for
+ Array Notation- linear version:
221 int i; 222 // Temporary array storing intermedia path length result to each vertex 223 unsigned int vtemp[VNUM]; 224 // Flag array: 225 // "1" means the shortest path hasn't been finished 226 // "0" menas the shortest path has been finished 227 unsigned char vflag[VNUM]; 228 229 // Main loop calculate the shortest path to all other vertexes from vertex "i" in each iteration 230 for (i = 0;i < VNUM;++i) { ... 248 minval = minpos = INFINITE; 249 // Loop scan vtemp to find the index of vertex having the shortest path in vtemp and its length 250 for (k = 0; k < VNUM;k++) 251 if (vtemp[k] < minval) { 252 minpos = k; 253 minval = vtemp[k]; 254 } ... 264 for (k = 0; k < VNUM;k++) 265 if (vflag[k] && ((m_graph[minpos][k] + minval) < vtemp[k])) { 266 vtemp[k] = (m_graph[minpos][k] + minval); 267 m_pvertex[i][k] = minpos; 268 } ... 270 }
This simple change creates code that ran about 5x faster on our machine.cilk_for
+ Array Notation version:386 cilk_for (int i = 0;i < VNUM;++i) { 387 // Declare temporary arrays inside "cilk_for" loop body to make them private 388 // Temporary array storing intermedia path length result to each vertex 389 unsigned int vtemp[VNUM]; 390 // Flag array: 391 // "1" means the shortest path hasn't been finished 392 // "0" menas the shortest path has been finished 393 unsigned char vflag[VNUM]; ... 412 vtemp[:] = (m_graph[minpos][:] + minval); 413 m_pvertex[i][:] = minpos; ... 423 if (vflag[:] && ((m_graph[minpos][:] + minval) < vtemp[:])) { 424 vtemp[:] = (m_graph[minpos][:] + minval); 425 m_pvertex[i][:] = minpos; 426 } ... 428 }
Performance Data:
Note: Modified Speedup shows performance speedup with respect to serial implementation.
Modified Speedup | Compiler (Intel® 64) | Compiler options | System specifications |
---|---|---|---|
AN: 1.2x cilk_for: 2.9x Both: 3.5x | Intel C++ Compiler 14.0 for Windows | /O2 /QxAVX /Qipo |
Microsoft Windows 7* Enterprise Edition 4th Generation Intel Core™ i5-4670T CPU @ 2.30GHz (4-core) 8GB memory |
AN: 1.8x cilk_for: 3.0x Both: 5.0x | Intel C++ Compiler 14.0 for Windows | /O2 /QxCORE-AVX2 /Qipo |
Microsoft Windows 7* Enterprise Edition 4th Generation Intel Core™ i5-4670T CPU @ 2.30GHz (4-core) 8GB memory |
AN: 1.4x cilk_for: 3.5x Both: 5.2x | Intel C++ Compiler 14.0 for Linux | -O2 -xAVX -ipo |
Ubuntu* 12.04 3rd Generation Intel Core™ i7-2600K CPU @ 3.40GHz (4-core) > 8GB memory |
Build Instructions:
- For Microsoft Visual Studio* 2010 and 2012 users:
- Open the solution
.sln
file[Optional] To collect performance numbers (will run example 5 times and take average time):- Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add
PERF_NUM
Choose a configuration (for best performance, choose a release configuration):- Intel-debug and Intel-release: uses Intel® C++ compiler
- VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
- Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add
- For Windows* Command Line users:
- Enable your particular compiler environmentfor Intel® C++ Compiler:
- open the appropriate Intel C++ compiler command prompt
- navigate to project folder
- compile with
Build.bat [perf_num]
perf_num
: collect performance numbers (will run example 5 times and take average time)- to run:
Build.bat run [help|0|1|2|3|4]
for Visual C++ Compiler (only linear/scalar will run):- open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
- navigate to project folder
- to compile:
Build.bat [perf_num]
perf_num
: collect performance numbers (will run example 5 times and take average time)
- to run:
Build.bat run
- For Linux* or OS X* users:
- set the icc environment:
source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
navigate to project folderfor Intel® C++ compiler:- to compile:
make [icpc] [perf_num=1]
perf_num=1:
collect performance numbers (will run example 5 times and take average time)
- to run:
make run [option=help|0|1|2|3|4]
for gcc (only linear/scalar will run):- compile with
make gcc [perf_num=1]
perf_num=1:
collect performance numbers (will run example 5 times and take average time)
- to run:
make run
- to compile:
Inglés