Loop distribution is another fundamental +O3 transformation necessary for more advanced transformations.
These advanced transformations require that all calculations in
a nested loop be performed inside the innermost loop. To facilitate
this, loop distribution transforms complicated nested loops into
several simple loops that contain all computations inside the body
of the innermost loop.
Loop distribution takes place at +O3 and above and is enabled by default. Specifying +Onoloop_transform disables loop distribution, as well as loop interchange,
loop blocking, loop fusion, loop unroll, and loop unroll and jam.
Loop distribution is disabled for specific loops by specifying
the no_distribute directive or pragma immediately before the loop.
The form of this directive and pragma is shown in Table 5-6 “Form of no_distribute directive and pragma”.
Table 5-6 Form of no_distribute directive and pragma
| Language | Form |
|---|
| Fortran | C$DIR NO_DISTRIBUTE |
| C | #pragma _CNX no_distribute |
Example 5-12 Loop
distribution
This example begins with the following Fortran code:
DO I = 1, N C(I) = 0 DO J = 1, M A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
Loop distribution creates two copies of the I loop, separating the nested J loop from the assignments to array C. In this way, all assignments are moved to innermost
loops. Interchange is then performed on the I and J loops.
The distribution and interchange is shown in the following
transformed code:
DO I = 1, N C(I) = 0 ENDDO DO J = 1, M DO I = 1, N A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
Distribution can improve efficiency by reducing the number
of memory references per loop iteration and the amount of cache
thrashing. It also creates more opportunities for interchange.