The compiler may interchange (or reorder) nested loops for
the following reasons:
To facilitate other transformations
To relocate the loop that is the most profitable
to parallelize so that it is outermost
To optimize inner-loop memory accesses
Loop interchange takes place at +O3 and above and is enabled by default. Specifying +Onoloop_transform disables loop interchange, as well as loop distribution,
loop blocking, loop fusion, loop unroll, and loop unroll and jam.
Example 5-16 Loop
interchange
This example begins with the Fortran matrix addition algorithm
below:
DO I = 1, N DO J = 1, M A(I, J) = B(I, J) + C(I, J) ENDDO ENDDO |
The loop accesses the arrays A, B and C row by row, which, in Fortran, is very inefficient.
Interchanging the I and J loops, as shown in the following example, facilitates
column by column access.
DO J = 1, M DO I = 1, N A(I, J) = B(I, J) + C(I, J) ENDDO ENDDO |
Unlike Fortran, C and C++ access arrays in row-major order.
An analogous example in C and C++, then, employs an opposite nest ordering,
as shown below.
for(j=0;j<m;j++) for(i=0;i<n;i++) a[i][j] = b[i][j] + c[i][j]; |
Interchange facilitates row-by-row access. The interchanged
loop is shown below.
for(i=0;i<n;i++) for(j=0;j<m;j++) a[i][j] = b[i][j] + c[i][j]; |