Quantcast
Channel: Intel® C++ Compiler
Viewing all articles
Browse latest Browse all 1616

Graph Algorithms: Shortest Path

$
0
0

Dijkstra algorithm is a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. The algorithm requires repeated searching for the vertex having the smallest distance and accumulating shortest distance from the source vertex. This example calculates the shortest path between each pair of vertexes in a complete graph having 2000 vertexes using Dijkstra algorithm. It run serially, with Intel® Cilk™ Plus Array Notation (AN) for vectorization, with Intel Cilk Plus cilk_for for parallelization, and with both vectorization and cilk_for.

 
  • System Requirements
  • Hardware:
    • Any Intel processor with Intel® Advanced Vector Extensions (Intel® AVX) support like 2nd Generation Intel Core™ i3, i5, or i7 processors and Intel Xeon® E3 or E5 processor family, or newer
    For Microsoft* Windows*:
    • Microsoft* Visual Studio 2010* or 2012* standard edition or above
    • Intel® C++ Composer XE 2013 SP1 for Windows* (visit the Release Notes for supported OSes)
    For Linux*:
    • GNU* GCC* 4.5 or newer
    • Intel® C++ Composer XE 2013 SP1 for Linux* (visit the Release Notes for supported OSes)

Code Change Highlights:

  • cilk_for
  • linear version:
    221 int i;
    222 // Temporary array storing intermedia path length result to each vertex
    223 unsigned int vtemp[VNUM];
    224 // Flag array: 
    225 // "1" means the shortest path hasn't been finished
    226 // "0" menas the shortest path has been finished
    227 unsigned char vflag[VNUM];
    228
    229 // Main loop calculate the shortest path to all other vertexes from vertex "i" in each iteration
    230 for (i = 0;i < VNUM;++i) {
                    
    ...
    
    270 }
    cilk_for version:
    331 cilk_for (int i = 0;i < VNUM;++i) { 
    332 // Declare temporary arrays inside "cilk_for" loop body to make them private
    333 // Temporary array storing intermedia path length result to each vertex
    334 unsigned int vtemp[VNUM]; 
    335 // Flag array: 
    336 // "1" means the shortest path hasn't been finished
    337 // "0" menas the shortest path has been finished
    338 unsigned char vflag[VNUM];
    
    ...
    
    377 }
  • Array Notation
  • Searching for the vertex having the smallest distance
    Scalar version:
    248 minval = minpos = INFINITE;
    249 // Loop scan vtemp to find the index of vertex having the shortest path in vtemp and its length
    250 for (k = 0; k < VNUM;k++)
    251   if (vtemp[k] < minval) {
    252       minpos = k;
    253       minval = vtemp[k];
    254 }
    Array notation version:
    306 minpos = __sec_reduce_min_ind(vtemp[:]);
    307 minval = vtemp[minpos];
    Accumulating shortest distance from the source vertex
    Scalar version:
    264 for (k = 0; k < VNUM;k++) 
    265     if (vflag[k] && ((m_graph[minpos][k] + minval) < vtemp[k])) {
    266        vtemp[k] = (m_graph[minpos][k] + minval);
    267        m_pvertex[i][k] = minpos;
    268    }
    Array notation version:
    317 if (vflag[:] && ((m_graph[minpos][:] + minval) < vtemp[:])) {
    318      vtemp[:] = (m_graph[minpos][:] + minval);
    319      m_pvertex[i][:] = minpos;
    320 }
  • cilk_for + Array Notation
  • linear version:
    221 int i;
    222 // Temporary array storing intermedia path length result to each vertex
    223 unsigned int vtemp[VNUM];
    224 // Flag array: 
    225 // "1" means the shortest path hasn't been finished
    226 // "0" menas the shortest path has been finished
    227 unsigned char vflag[VNUM];
    228
    229 // Main loop calculate the shortest path to all other vertexes from vertex "i" in each iteration
    230 for (i = 0;i < VNUM;++i) {
                    
    ...
    
    248         minval = minpos = INFINITE;
    249         // Loop scan vtemp to find the index of vertex having the shortest path in vtemp and its length
    250         for (k = 0; k < VNUM;k++)
    251             if (vtemp[k] < minval) {
    252                 minpos = k;
    253                 minval = vtemp[k];
    254             }
    
    ...
    
    264         for (k = 0; k < VNUM;k++) 
    265             if (vflag[k] && ((m_graph[minpos][k] + minval) < vtemp[k])) {
    266                 vtemp[k] = (m_graph[minpos][k] + minval);
    267                 m_pvertex[i][k] = minpos;
    268             }
    
    ...
    
    270 }
    cilk_for + Array Notation version:
    386 cilk_for (int i = 0;i < VNUM;++i) { 
    387 // Declare temporary arrays inside "cilk_for" loop body to make them private
    388 // Temporary array storing intermedia path length result to each vertex
    389 unsigned int vtemp[VNUM]; 
    390 // Flag array: 
    391 // "1" means the shortest path hasn't been finished
    392 // "0" menas the shortest path has been finished
    393 unsigned char vflag[VNUM];
                    
    ...
    
    412      vtemp[:] = (m_graph[minpos][:] + minval);
    413      m_pvertex[i][:] = minpos;
    
    
    ...
    
    423      if (vflag[:] && ((m_graph[minpos][:] + minval) < vtemp[:])) {
    424          vtemp[:] = (m_graph[minpos][:] + minval);
    425          m_pvertex[i][:] = minpos;
    426      }
    
    ...
    
    428 }
    This simple change creates code that ran about 5x faster on our machine.

Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified SpeedupCompiler (Intel® 64)Compiler optionsSystem specifications
AN: 1.2x
cilk_for: 2.9x
Both: 3.5x
Intel C++ Compiler 14.0 for Windows/O2 /QxAVX /Qipo Microsoft Windows 7* Enterprise Edition
4th Generation Intel Core™ i5-4670T CPU @ 2.30GHz (4-core)
8GB memory
AN: 1.8x
cilk_for: 3.0x
Both: 5.0x
Intel C++ Compiler 14.0 for Windows/O2 /QxCORE-AVX2 /Qipo Microsoft Windows 7* Enterprise Edition
4th Generation Intel Core™ i5-4670T CPU @ 2.30GHz (4-core)
8GB memory
AN: 1.4x
cilk_for: 3.5x
Both: 5.2x
Intel C++ Compiler 14.0 for Linux-O2 -xAVX -ipo Ubuntu* 12.04
3rd Generation Intel Core™ i7-2600K CPU @ 3.40GHz (4-core)
> 8GB memory

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows* Command Line users:
  • Enable your particular compiler environment
    for Intel® C++ Compiler:
    • open the appropriate Intel C++ compiler command prompt
    • navigate to project folder
    • compile with Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • to run: Build.bat run [help|0|1|2|3|4]
    for Visual C++ Compiler (only linear/scalar will run):
    • open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
    • navigate to project folder
    • to compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • to run: Build.bat run
  • For Linux* or OS X* users:
  • set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    navigate to project folder
    for Intel® C++ compiler:
    • to compile: make [icpc] [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • to run: make run [option=help|0|1|2|3|4]
    for gcc (only linear/scalar will run):
    • compile with make gcc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • to run: make run
Inglés

Viewing all articles
Browse latest Browse all 1616

Trending Articles