In this step, you will analyze the results to determine if you want to implement the report recommendations.
The report applies to the following loop in the source code file scalar_dep.cpp:
for (i=0; i<n; i++) { if (A[i] > 0) {b=A[i]; A[i] = 1 / A[i]; } if (A[i] > 1) {A[i] += b;} }
The report recommends the following:
Use the /Qparallel option to improve auto-parallelization.
Unconditionally assign the variable
b
to allow the compiler to vectorize the loop.
You can add the /Qparallel option and then run another analysis to produce another report.
Note
You will implement the second recommendation in the next step of this tutorial.In Solution Explorer, right-click on the GAP-c project and select Properties. The Property Pages dialog box appears.
In the Configuration Properties >C/C++> Optimization [Intel C++]> Parallelization dropdown list box, select Yes (/Qparallel) and click the OK.
In Solution Explorer, open the Source Files folder and right-click on scalar_dep.cpp.
Select Intel Compiler > Guided Auto Parallelism > Run Analysis on file "scalar_dep.cpp'.
The Analysis with Multi-file optimization dialog box appears. Click Run Analysis.
Click the button on any message box that appears.
Find the following in the compiler output.
scalar_dep.cpp(79) remark #30521: (par) Assign a value to the variable(s)"b" at the beginning of the body of the loop in line 79. This will allow the loop to be parallelized. [VERIFY] Make sure that, in the original program, the variable(s) "b" read in any iteration of the loop has been defined earlier in the same iteration. scalar_dep.cpp(79) remark #30525: (PAR) Insert a "#pragma loop count min(256)" statement right before the loop at line 79 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 256 iterations.
Note
Your remark and line numbers in the output may be different.The first remark indicates that the compiler cannot parallelize the loop because the variable
b
is conditionally assigned. The second remark indicates that the number of loop iterations must be at least the specified number for the compiler to parallelize the loop.
In the next step, you will implement the recommendations.