| United States-English |
|
|
|
![]() |
Parallel Programming Guide for HP-UX Systems > Chapter 5 Loop
and cross-module optimization featuresData localization |
|
Data localization occurs as a result of various loop transformations that occur at optimization levels +O2 or +O3. Because optimizations are cumulative, specifying +O3 or +O4 takes advantage of the transformations that happen at +O2. Table 5-1 Loop transformations affecting data localization
Data localization keeps frequently used data in the processor data cache, eliminating the need for more costly memory accesses. Loops that manipulate arrays are the main candidates for localization optimizations. Most of these loops are eligible for the various transformations that the compiler performs at +O3. These transformations are explained in detail in this section. Some loop transformations cause loops to be fully or partially replicated. Because unlimited loop replication can significantly increase compile times, loop replication is limited by default. You can increase this limit (and possibly increase your program’s compile time and code size) by specifying the +Onosize and +Onolimit compiler options.
Any of the following conditions can inhibit or prevent data localization: The following sections discuss these conditions and their effects on data localization. A loop-carried dependence (LCD) exists when one iteration of a loop assigns a value to an address that is referenced or assigned on another iteration. In some cases, LCDs can inhibit loop interchange, thereby inhibiting localization. Typically, these cases involve array indexes that are offset in opposite directions. To ignore LCDs, use the no_loop_dependence directive or pragma. The form of this directive and pragma is shown in Table 5-2 “Form of no_loop_dependence directive and pragma”.
Table 5-2 Form of no_loop_dependence directive and pragma
where
Example 5-3 Loop-carried dependences The Fortran loop below contains an LCD that inhibits interchange:
C and C++ loops can contain similar constructs, but to simplify illustration, only the Fortran example is discussed here. As written, this loop uses A(I-1,J-1) and A(I-1,J+1) to compute A(I,J). Table 5-3 “Computation sequence of A(I,J): original loop” shows the sequence in which values of A are computed for this loop. Table 5-3 Computation sequence of A(I,J): original loop
As shown in Table 5-3 “Computation sequence of A(I,J): original loop”, the original loop computes the elements of the current row of A using the elements of the previous row of A. For all rows except the first (which is never written), the values contained in the previous row must be written before the current row is computed. This dependence must be honored for the loop to yield its intended results. If a row element of A is computed before the previous row elements are computed, the result is incorrect. Interchanging the I and J loops yields the following code:
After interchange, the loop computes values of A in the sequence shown in Table 5-4 “Computation sequence of A(I,J): interchanged loop”. Table 5-4 Computation sequence of A(I,J): interchanged loop
Here, the elements of the current column of A are computed using the elements of the previous column and the next column of A. The problem here is that columns of A are being computed using elements from the next column, which have not been written yet. This computation violates the dependence illustrated in Table 5-3 “Computation sequence of A(I,J): original loop”. The element-to-element dependences in both the original and interchanged loop are illustrated in Figure 5-1 “LCDs in original and interchanged loops”. The arrows in Figure 5-1 “LCDs in original and interchanged loops” represent dependences from one element to another, pointing at elements that depend on the elements at the arrows’ bases. Shaded elements indicate a typical row or column computed in the inner loop:
This figure helps to illustrate the sequence in which the array elements are cycled through by the respective loops: the original loop cycles across all the columns in a row, then moves on to the next row. The interchanged loop cycles down all the rows in a column first, then moves on to the next column. Example 5-4 Avoid loop interchange Interchange is inhibited only by loops that contain dependences that change when the loop is interchanged. Most LCDs do not fall into this category and thus do not inhibit loop interchange. Occasionally, the compiler encounters an apparent LCD. If it cannot determine whether the LCD actually inhibits interchange, it conservatively avoids interchanging the loop. The following Fortran example illustrates this situation:
In these example, if IADD and JADD are either both positive or both negative, the loop contains no interchange-inhibiting dependence. However, if one and only one of the variables is negative, interchange is inhibited. The compiler has no way of knowing the runtime values of IADD and JADD, so it avoids interchanging the loop. If you are positive that the IADD and JADD are both negative or both positive, you can tell the compiler that the loop is free of dependences using the no_loop_dependence directive or pragma, described in this chapter Table 5-2 “Form of no_loop_dependence directive and pragma”. The previous Fortran loop is interchanged when the NO_LOOP_DEPENDENCE directive is specified for A on the J loop as shown in the following code:
If IADD and JADD acquire opposite-signed values at runtime, these loops may result in incorrect answers. In some cases, loop fusion is also inhibited by simpler dependences than those that inhibit interchange. Consider the following Fortran example:
While it might appear that loop fusion would benefit the preceding example, it would actually yield the following incorrect code:
This loop produces different answers than the original loops, because the reference to A(ITEMP+1) in the fused loop accesses a value that has not been assigned yet, while the analogous reference to A(J+1) in the original J loop accesses a value that was assigned in the original I loop. An alias is an alternate name for an object. Aliasing occurs in a program when two or more names are attached to the same memory location. Aliasing is typically caused in Fortran by use of the EQUIVALENCE statement. The use of pointers normally causes the problem in C and C++. Passing identical actual arguments into different dummy arguments in a Fortran subprogram can also cause aliasing, as can passing the same address into different pointer arguments in a C or C++ function. Example 5-5 Aliasing Aliasing interferes with data localization because it can mask LCDs where arrays A and B have been equivalenced. This is shown in the following Fortran example:
This loop has the same problem as the loop used to demonstrate LCDs in the previous section; because A and B refer to the same array, the loop contains an LCD on A, which prevents interchange and thus interferes with localization. The C and C++ equivalent of this loop follows. Keep in mind that C and C++ store arrays in row-major order, which requires different subscripting to access the same elements.
Fortran’s EQUIVALENCE statement is imitated in C and C++; through the use of pointers, arrays are effectively equivalenced, as shown. Passing the same address into different dummy procedure arguments can yield the same result. Fortran passes arguments by reference while C and C++ pass them by value. However, pass-by-reference is simulated in C and C++ by passing the argument’s address into a pointer in the receiving procedure or in C++ by using references. Example 5-6 Aliasing The following Fortran code exhibits the same aliasing problem as the previous example, but the alias is created by passing the same actual argument into different dummy arguments.
The following (legal ANSI C) code shows the same argument-passing problem in C:
When the Fortran compiler encounters a computed or assigned GOTO statement in an otherwise interchangeable loop, it cannot always determine whether the branch destination is within the loop. Because an out-of-loop destination would be a loop exit, these statements often prevent loop interchange and therefore data localization. The order in which values are read into or written from a loop may change if the loop is interchanged. For this reason, I/O statements inhibit interchange and, consequently, data localization.
Given a data stream consisting of alternating zeros and ones (0,1,0,1,0,1...), the contents for A(I,J) for both the original loop and the interchanged loop are shown in Figure 5-2 “Values read into array A”. Loops that contain multiple entries or exits inhibit data localization because they cannot safely be interchanged. Extra loop entries are usually created when a loop contains a branch destination. Extra exits are more common, however. These are often created in C and C++ using the break statement, and in Fortran using the GOTO statement. As noted before, the order of computation changes if the loops are interchanged.
Interchanging this loop would change the order in which the values of a are computed. The original loop computes a column-by-column, whereas the interchanged loop would compute it row-by-row. This means that the interchanged loop may hit the break statement and exit after computing a different set of elements than the original loop computes. Interchange therefore may cause the results of the loop to differ and must be avoided. Like loops with multiple exits, RETURN and STOP statements in Fortran inhibit localization because they inhibit interchange. If a loop containing a RETURN or STOP is interchanged, its order of computation may change, giving wrong answers. Similar to Fortran’s RETURN and STOP statements (discussed in the previous section), return and exit statements in C and C++ inhibit localization because they inhibit interchange. In C++, throw statements, like loops containing multiple exits, inhibit localization because they inhibit interchange. HP compilers are unaware of the side effects of most procedures, and therefore cannot determine whether or not they might interfere with loop interchange. Consequently, the compilers do not perform loop interchange in an embedded procedure call. These side effects may include data dependences involving loop arrays, aliasing (as described in the section “Aliasing”), and processor data cache that use conflicts with the loop’s cache. This renders useless any data localization optimizations performed on the loop.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||