Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers > Chapter 5 Loop and cross-module optimization features

Loop interchange

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

The compiler may interchange (or reorder) nested loops for the following reasons:

  • To facilitate other transformations

  • To relocate the loop that is the most profitable to parallelize so that it is outermost

  • To optimize inner-loop memory accesses

Loop interchange takes place at +O3 and above and is enabled by default. Specifying +Onoloop_transform disables loop interchange, as well as loop distribution, loop blocking, loop fusion, loop unroll, and loop unroll and jam.

Loop interchange

This example begins with the Fortran matrix addition algorithm below:

DO I = 1, N
DO J = 1, M
A(I, J) = B(I, J) + C(I, J)
ENDDO
ENDDO

The loop accesses the arrays A, B and C row by row, which, in Fortran, is very inefficient. Interchanging the I and J loops, as shown in the following example, facilitates column by column access.

DO J = 1, M
DO I = 1, N
A(I, J) = B(I, J) + C(I, J)
ENDDO
ENDDO

Unlike Fortran, C and C++ access arrays in row-major order. An analogous example in C and C++, then, employs an opposite nest ordering, as shown below.

for(j=0;j<m;j++)
for(i=0;i<n;i++)
a[i][j] = b[i][j] + c[i][j];

Interchange facilitates row-by-row access. The interchanged loop is shown below.

for(i=0;i<n;i++)
for(j=0;j<m;j++)
a[i][j] = b[i][j] + c[i][j];
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.