Lets you specify a scheduling algorithm for loop iterations.
Arguments
n | Is the size of the chunk or the number of iterations for each chunk. This setting can only be specified for static, dynamic, and guided. For more information, see the descriptions of each keyword below. |
Default
static-balanced | Iterations are divided into even-sized chunks and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number. |
Description
This option lets you specify a scheduling algorithm for loop iterations. It specifies how iterations are to be divided among the threads of the team.
This option is only useful when specified with option [Q]parallel.
This option affects performance tuning and can provide better performance during auto-parallelization. It does nothing if it is used with option -qopenmp (Linux* OS and OS X*) or /Qopenmp (Windows* OS).
Option | Description |
---|---|
[Q]par-schedule-auto | Lets the compiler or run-time system determine the scheduling algorithm. Any possible mapping may occur for iterations to threads in the team. |
[Q]par-schedule-static | Divides iterations into contiguous pieces (chunks) of size n. The chunks are assigned to threads in the team in a round-robin fashion in the order of the thread number. Note that the last chunk to be assigned may have a smaller number of iterations. If no n is specified, the iteration space is divided into chunks that are approximately equal in size, and each thread is assigned at most one chunk. |
[Q]par-schedule-static-balanced | Divides iterations into even-sized chunks. The chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number. |
[Q]par-schedule-static-steal | Divides iterations into even-sized chunks, but when a thread completes its chunk, it can steal parts of chunks assigned to neighboring threads. Each thread keeps track of L and U, which represent the lower and upper bounds of its chunks respectively. Iterations are executed starting from the lower bound, and simultaneously, L is updated to represent the new lower bound. |
[Q]par-schedule-dynamic | Can be used to get a set of iterations dynamically. Assigns iterations to threads in chunks as the threads request them. The thread executes the chunk of iterations, then requests another chunk, until no chunks remain to be assigned. As each thread finishes a piece of the iteration space, it dynamically gets the next set of iterations. Each chunk contains n iterations, except for the last chunk to be assigned, which may have fewer iterations. If no n is specified, the default is 1. |
[Q]par-schedule-guided | Can be used to specify a minimum number of iterations. Assigns iterations to threads in chunks as the threads request them. The thread executes the chunk of iterations, then requests another chunk, until no chunks remain to be assigned. For a chunk of size 1, the size of each chunk is proportional to the number of unassigned iterations divided by the number of threads, decreasing to 1. For an n with value k (greater than 1), the size of each chunk is determined in the same way with the restriction that the chunks do not contain fewer than k iterations (except for the last chunk to be assigned, which may have fewer than k iterations). If no n is specified, the default is 1. |
[Q]par-schedule-guided-analytical | Divides iterations by using exponential distribution or dynamic distribution. The method depends on run-time implementation. Loop bounds are calculated with faster synchronization and chunks are dynamically dispatched at run time by threads in the team. |
[Q]par-schedule-runtime | Defers the scheduling decision until run time. The scheduling algorithm and chunk size are then taken from the setting of environment variable OMP_SCHEDULE. |
Note
This option may behave differently on Intel® microprocessors than on non-Intel microprocessors.