The compiler may interchange (or reorder) nested loops for the following
reasons:
To facilitate other transformations
To relocate the loop that is the most profitable
to parallelize so that it is outermost
To optimize inner-loop memory accesses
Loop interchange takes place at +O3
and above and is enabled by default. Specifying +Onoloop_transform
disables loop interchange, as well as loop distribution, loop blocking,
loop fusion, loop unroll, and loop unroll and jam.
Loop interchange
This example begins with the Fortran matrix addition algorithm
below:
DO I = 1, N DO J = 1, M A(I, J) = B(I, J) + C(I, J) ENDDO ENDDO |
The loop accesses the arrays A,
B and C
row by row, which, in Fortran, is very inefficient. Interchanging
the I and J
loops, as shown in the following example, facilitates column by
column access.
DO J = 1, M DO I = 1, N A(I, J) = B(I, J) + C(I, J) ENDDO ENDDO |
Unlike Fortran, C and C++ access arrays
in row-major order. An analogous example in C and C++,
then, employs an opposite nest ordering, as shown below.
for(j=0;j<m;j++) for(i=0;i<n;i++) a[i][j] = b[i][j] + c[i][j]; |
Interchange facilitates row-by-row access. The interchanged
loop is shown below.
for(i=0;i<n;i++) for(j=0;j<m;j++) a[i][j] = b[i][j] + c[i][j]; |