Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems > Chapter 8 Optimization Report

Loop Report

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The Loop Report lists the optimizations that are performed on loops and calls. If appropriate, the report gives reasons why a possible optimization was not performed. Loop nests are reported in the order in which they are encountered and separated by a blank line.

Below is a sample optimization report.

            Optimization ReportLine      Id    Var       Reordering        New       Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
3 1 sub1 *Inlined call (2-4)
8 2 iloopi:1 Serial Fused
11 3 jloopi:2 Serial Fused
14 4 kloopi:3 Serial Fused
*Fused (5) (2 3 4) -> (5)
8 5 iloopi:1 PARALLEL Footnoted User
Var Name Var Name
-----------------------------------------------------------------------------
iloopi:1 iloopindex
jloopi:2 jloopindex
kloopi:3 kloopindex
Optimization for sub1Line Id Var Reordering New Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
8 1 iloopi:1 Serial Fused
11 2 jloopi:2 Serial Fused
14 3 kloopi:3 Serial Fused
*Fused (4) (1 2 3) -> (4)
8 4 iloopi:1 PARALLEL Footnoted User
Var Name Var Name
-----------------------------------------------------------------------------
iloopi:1 iloopindex
jloopi:2 jloopindex
kloopi:3 kloopindex

A description of each column of the Loop Report is shown in Table 8-2 “Loop Report column definitions”.

Table 8-2 Loop Report column definitions

ColumnDescription
Line Num.Specifies the source line of the beginning of the loop or of the loop from which it was derived. For cloned calls and inlined calls, the Line Num. column specifies the source line at which the call statement appears.
Id Num.

Specifies a unique ID number for every optimized loop and for every optimized call. This ID number can then be referenced by other parts of the report. Both loops appearing in the original program source and loops created by the compiler are given loop ID numbers. Loops created by the compiler are also shown in the New Id Nums column as described later. No distinction between compiler-generated loops and loops that existed in the original source is made in the Id Num. column. Loops are assigned unique, sequential numbers as they are encountered.

Var Name

Specifies the name of the iteration variable controlling the loop or the called procedure if the line represents a call. If the variable is compiler-generated, its name is listed as *VAR*. If it consists of a truncated variable name followed by a colon and a number, the number is a reference to the variable name footnote table, which appears after the Loop Report and Analysis Table in the Optimization Report.

Reordering Transformation

Indicates which reordering transformations were performed. Reordering transformations are performed on loops, calls, and loop nests, and typically involve reordering and/or duplicating sections of code to facilitate more efficient execution. This column has one of the values shown in Table 8-3 “Reordering transformation values in the Loop Report” .

New Id Nums

Specifies the ID number for loops or calls created by the compiler. These ID numbers are listed in the Id Num. column and is referenced in other parts of the report. However, the loops and calls they represent were not present in the original source code. In the case of loop fusion, the number in this column indicates the new loop created by merging all the fused loops. New ID numbers are also created for cloned calls, inlined calls, loop blocking, loop distribution, loop interchange, loop unroll and jam, dynamic selection, and test promotion.

Optimizing / Special Transformation

Indicates which, if any, optimizing transformations were performed. An optimizing transformation reduces the number of operations executed, or replaces operations with simpler operations. A special transformation allows the compiler to optimize code under special circumstances. When appropriate, this column has one of the values shown in Table 8-4 “Optimizing/special transformations values in the Loop Report” .

 

The following values apply to the Reordering Transformation column described in Table 8-2 “Loop Report column definitions”.

Table 8-3 Reordering transformation values in the Loop Report

ValueDescription
BlockLoop blocking was performed. The new loop order is indicated under the Optimizing/Special Transformation column, as shown in Table 8-4 “Optimizing/special transformations values in the Loop Report”.
Cloned callA call to a subroutine was cloned.
DistLoop distribution was performed.
DynSelDynamic selection was performed. The numbers in the New Id Nums column correspond to the loops created. For parallel loops, these generally include a PARALLEL and a Serial version.
FusedThe loops were fused into another loop and no longer exist. The original loops and the new loop is indicated under the Optimizing/Special Transformation column, as shown in Table 8-4 “Optimizing/special transformations values in the Loop Report”.
Inlined callA call to a subroutine was inlined.
InterchangeLoop interchange was performed. The new loop order is indicated under the Optimizing/Special Transformation column, as shown in Table 8-4 “Optimizing/special transformations values in the Loop Report”.
NoneNo reordering transformation was performed on the call.
PARALLELThe loop runs in thread-parallel mode.
PeelThe first or last iteration of the loop was peeled in order to fuse the loop with an adjacent loop.
PromoteTest promotion was performed.
SerialNo reordering transformation was performed on the loop.
Unroll and JamThe loop was unrolled and the nested loops were jammed (fused).
VECTORThe loop was fully or partially replaced with more efficient calls to one or more vector routines.
*Appears at left of loop-producing transformation optimizations (distribution, dynamic selection, blocking, fusion, interchange, call cloning, call inlining, peeling, promotion, unroll and jam).

 

The following values apply to the Optimizing/special transformations column described in Table 8-2 “Loop Report column definitions”.

Table 8-4 Optimizing/special transformations values in the Loop Report

ValueExplanation
FusedThe loop was fused into another loop and no longer exists.
ReductionThe compiler recognized a reduction in the loop.
RemovedThe compiler removed the loop.
UnrolledThe loop was completely unrolled.
(OrigOrder) -> (InterchangedOrder)This information appears when Interchange is reported under Reordering Transformation. OrigOrder indicates the order of loops in the original nest. InterchangedOrder indicates the new order that occurs due to interchange. OrigOrder and InterchangedOrder consist of user iteration variables presented in outermost to innermost order.
(OrigLoops)->(NewLoop)This information appears when Fused is reported under Reordering Transformation. OrigLoops indicates the original loops that were fused by the compiler to form the loop indicated by NewLoop. OrigLoops and NewLoop refer to loops based on the values from the Id Num. and New Id Nums columns in the Loop Report.
(OrigLoopNest)->(BlockedLoopNest)This information appears when Block is reported under Reordering Transformation. OrigLoopNest indicates the order of the original loop nest containing a loop that was blocked. BlockedLoopNest indicates the order of loops after blocking. OrigLoopNest and BlockedLoopNest refer to user iteration variables presented in outermost to innermost order.

 

Supplemental tables

The tables described in this section may be included in the Optimization Report to provide information supplemental to the
Loop Report.

Analysis Table

If necessary, an Analysis Table is included in the Optimization Report to further elaborate on optimizations reported in the Loop Report.

A description of each column in the Analysis Table is shown in Table 8-5 “Analysis Table column definitions”.

Table 8-5 Analysis Table column definitions

ColumnDescription
Line Num.

Specifies the source line of the beginning of the loop or call.

Id Num.

References the ID number assigned to the loop or call in the Loop Report.

Var Name

Specifies the name of the iteration variable controlling the loop, *VAR* (as discussed in the Var Name description in the section “Loop Report”).

Analysis

Indicates why a transformation or optimization was not performed, or additional information on what was done.

 

Privatization Table

This table reports any user variables contained in a parallelized loop that are privatized by the compiler. Because the Privatization Table refers to loops, the Loop Report is automatically provided with it.

A description of each column in the Privatization Table is shown in Table 8-6 “Privatization Table column definitions”.

Table 8-6 Privatization Table column definitions

ColumnDefinitions
Line Num.

Specifies the source line of the beginning of the loop.

Id Num.

References the ID number assigned to the loop in the loop table.

Var Name

Specifies the name of the iteration variable controlling the loop. *VAR* may also appear in this column, as discussed in the Var Name description in the section “Loop Report”.

Priv Var

Specifies the name of the privatized user variable. Compiler-generated variables that are privatized are not reported here.

Privatization Information for Parallel Loops

Provides more detail on the variable privatizations performed.

 

Variable Name Footnote Table

Variable names that are too long to fit in the Var Name columns of the other tables are truncated and followed by a colon and a footnote number. These footnotes are explained in the Variable Name Footnote Table.

A description of each column in the Variable Name Footnote Table is shown in Table 8-7 “Variable Name Footnote Table column definitions”.

Table 8-7 Variable Name Footnote Table column definitions

ColumnDefinition
Footnoted Var Name

Specifies the truncated variable name and its footnote number.

User Var NameSpecifies the full name of the variable as identified in the source code.

 

Example 8-1 Optimization Report

The following Fortran program is the basis for the Optimization Report shown in this example. Line numbers are provided for ease of reference.

1     PROGRAM EXAMPLE99
2 REAL A(100), B(100), C(100)
3 CALL SUB1(A,B,C)
4 END
5
6 SUBROUTINE SUB1(A,B,C)
7 REAL A(100), B(100), C(100)
8 DO ILOOPINDEX=1,100
9 A(ILOOPINDEX) = ILOOPINDEX
10 ENDDO
11 DO JLOOPINDEX=1,100
12 B(JLOOPINDEX) = A(JLOOPINDEX)**2
13 ENDDO
14 DO KLOOPINDEX=1, 100
15 C(KLOOPINDEX) = A(KLOOPINDEX) + B(KLOOPINDEX)
16 ENDDO
17 PRINT *, A(1), B(50), C(100)
18 END

The following Optimization Report is generated by compiling the program EXAMPLE99 with the command-line options +O3 +Oparallel +Oreport=all +Oinline=sub1:

% f90 +O3 +Oparallel +Oreport=all +Oinline=sub1 EXAMPLE99.f

Optimization for EXAMPLE99Line      Id    Var       Reordering        New       Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
3 1 sub1 *Inlined call (2-4)
8 2 iloopi:1 Serial Fused
11 3 jloopi:2 Serial Fused
14 4 kloopi:3 Serial Fused
*Fused (5) (2 3 4) -> (5)
8 5 iloopi:1 PARALLEL
Footnoted User
Var Name Var Name
-----------------------------------------------------------------------------
iloopi:1 iloopindex
jloopi:2 jloopindex
kloopi:3 kloopindex
Optimization for sub1Line Id Var Reordering New Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
8 1 iloopi:1 Serial Fused
11 2 jloopi:2 Serial Fused
14 3 kloopi:3 Serial Fused
*Fused (4) (1 2 3) -> (4)
8 4 iloopi:1 PARALLEL

Footnoted User
Var Name Var Name
-----------------------------------------------------------------------------
iloopi:1 iloopindex
jloopi:2 jloopindex
kloopi:3 kloopindex

The Optimization Report for EXAMPLE99 provides the following information:

  • Call to sub1 is inlined
    The first line of the Loop Report shows that the call to sub1 was inlined, as shown below:

3 1 sub1 *Inlined call (2-4)

  • Three new loops produced
    The inlining produced three new loops in EXAMPLE99: Loop #2,
    Loop #3
    , and Loop #4. Internally, the EXAMPLE99 module that originally looked like:

   1     PROGRAM EXAMPLE99
2 REAL A(100), B(100), C(100)
3 CALL SUB1(A,B,C)
4 END

now looks like this:

      PROGRAM EXAMPLE99
REAL A(100), B(100), C(100)
DO ILOOPINDEX=1,100 !Loop #2
A(ILOOPINDEX) = ILOOPINDEX
ENDDO
DO JLOOPINDEX=1,100 !Loop #3
B(JLOOPINDEX) = A(JLOOPINDEX)**2
ENDDO
DO KLOOPINDEX=1, 100 !Loop #4
C(KLOOPINDEX) = A(KLOOPINDEX) + B(KLOOPINDEX)
ENDDO
PRINT *, A(1), B(50), C(100)
END
  • New loops are fused
    These lines indicate that the new loops have been fused. The following line indicates that the three loops were fused into one new loop, Loop #5.

        8        2  iloopi:1  Serial                      Fused
    11 3 jloopi:2 Serial Fused
    14 4 kloopi:3 Serial Fused
    *Fused (5) (2 3 4) (5)

    After fusing, the code internally appears as the following:

          PROGRAM EXAMPLE99
    REAL A(100), B(100), C(100)
    DO ILOOPINDEX=1,100 !Loop #5
    A(ILOOPINDEX) = ILOOPINDEX
    B(ILOOPINDEX) = A(ILOOPINDEX)**2
    C(ILOOPINDEX) = A(ILOOPINDEX) + B(ILOOPINDEX)
    ENDDO
    PRINT *, A(1), B(50), C(100)
    END
  • New loop is parallelized
    In the following Loop Report line:

8 5 iloopi:1 PARALLEL

Loop #5 uses iloopi:1 as the iteration variable, referencing the Variable Name Footnote Table; iloopi:1 corresponds to iloopindex. The same line in the report also indicates that the newly-created Loop #5 was parallelized.

  • Variable Name Footnote Table lists iteration variables
    According to the Variable Name Footnote Table (duplicated below), the original variable iloopindex is abbreviated by the compiler as iloopi:1 so that it fits into the Var Name columns of other reports.

    jloopindex and kloopindex are abbreviated as jloopi:2 and kloopi:3, respectively. These names are used throughout the report to refer to these iteration variables.

                          Footnoted   User
    Var Name Var Name
    -----------------------
    iloopi:1 iloopindex
    jloopi:2 jloopindex
    kloopi:3 kloopindex

Example 8-2 Optimization Report

The following Fortran code provides an example of other transformations the compiler performs. Line numbers are provided for ease of reference.

1     PROGRAM EXAMPLE100
2
3 INTEGER IA1(100), IA2(100), IA3(100)
4 INTEGER I1, I2
5
6 DO I = 1, 100
7 IA1(I) = I
8 IA2(I) = I * 2
9 IA3(I) = I * 3
10 ENDDO
11
12 I1 = 0
13 I2 = 100
14 CALL SUB1 (IA1, IA2, IA3, I1, I2)
15 END
16
17 SUBROUTINE SUB1(A, B, C, S, N)
18 INTEGER A(N), B(N), C(N), S, I, J
19 DO J = 1, N
20 DO I = 1, N
21 IF (I .EQ. 1) THEN
22 S = S + A(I)
23 ELSE IF (I .EQ. N) THEN
24 S = S + B(I)
25 ELSE
26 S = S + C(I)
27 ENDIF
28 ENDDO
29 ENDDO
30 END

The following Optimization Report is generated by compiling the program EXAMPLE100 for parallelization:

% f90 +O3 +Oparallel +Oreport=all example100.f

            Optimization for SUB1

Line Id Var Reordering New Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
19 1 j *Interchange (2) (j i) -> (i j)
20 2 i *DynSel (3-4)
20 3 i PARALLEL Reduction
19 5 j *Promote (6-7)
19 6 j Serial
19 7 j Serial

20 4 i Serial
19 8 j *Promote (9-10)
19 9 j Serial
19 10 j *Promote (11-12)
19 11 j Serial
19 12 j Serial

Line Id Var Analysis
Num. Num. Name
-----------------------------------------------------------------------------
19 5 j Test on line 21 promoted out of loop
19 8 j Test on line 21 promoted out of loop
19 10 j Test on line 23 promoted out of loop
            Optimization for clone 1 of SUB1 (6_e70_cl_sub1)

Line Id Var Reordering New Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
19 1 j *Interchange (2) (j i) -> (i j)
20 2 i PARALLEL Reduction
19 3 j *Promote (4-5)
19 4 j Serial
19 5 j *Promote (6-7)
19 6 j Serial
19 7 j Serial

Line Id Var Analysis
Num. Num. Name
-----------------------------------------------------------------------------
19 3 j Test on line 21 promoted out of loop
19 5 j Test on line 23 promoted out of loop

Optimization for example100

Line Id Var Reordering New Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
6 1 i Serial

14 2 sub1 *Cloned call (3)
14 3 sub1 None

Line Id Var Analysis
Num. Num. Name
-----------------------------------------------------------------------------
14 2 sub1 Call target changed to clone 1 of SUB1 (6_e70_cl_sub1)

The Optimization Report for EXAMPLE100 shows Optimization Reports for the subroutine and its clone, followed by the optimizations to the subroutine. It includes the following information:

  • Original subroutine contents
    Originally, the subroutine appeared as shown below:

17    SUBROUTINE SUB1(A, B, C, S, N)
18 INTEGER A(N), B(N), C(N), S, I, J
19 DO J = 1, N
20 DO I = 1, N
21 IF (I .EQ. 1) THEN
22 S = S + A(I)
23 ELSE IF (I .EQ. N) THEN
24 S = S + B(I)
25 ELSE
26 S = S + C(I)
27 ENDIF
28 ENDDO
29 ENDDO
30 END


  • Loop interchange performed first
    The compiler first performs loop interchange (listed as Interchange in the report) to maximize cache performance:

19        1  j        *Interchange       (2)       (j i) -> (i j)
  • The subroutine then becomes the following

17    SUBROUTINE SUB1(A, B, C, S, N)
18 INTEGER A(N), B(N), C(N), S, I, J
19 DO I = 1, N ! Loop #2
20 DO J = 1, N ! Loop #1
21 IF (I .EQ. 1) THEN
22 S = S + A(I)
23 ELSE IF (I .EQ. N) THEN
24 S = S + B(I)
25 ELSE
26 S = S + C(I)
27 ENDIF
28 ENDDO
29 ENDDO
30 END
  • The program is optimized for parallelization
    The compiler would like to parallelize the outermost loop in the nest, which is now the I loop. However because the value of N is not known, the compiler does not know how many times the I loop needs to be executed. To ensure that the loop is executed as efficiently as possible at runtime, the compiler replaces the I loop nest with two new copies of the I loop nest, one to be run in parallel, the other to be run serially.

  • Dynamic selection is executed
    An IF is then inserted to select the more efficient version of the loop to execute at runtime. This method of making one copy for parallel execution and one copy for serial execution is known as dynamic selection, which is enabled by default when +O3 +Oparallel is specified (see “Dynamic selection” for more information). This optimization is reported in the Loop Report in the line:

   20        2  i        *DynSel            (3-4)
  • Loop#2 creates two loops
    According to the report, Loop #2 was used to create the new loops, Loop #3 and Loop #4. Internally, the code now is represented as follows:

      SUBROUTINE SUB1(A, B, C, S, N)
INTEGER A(N), B(N), C(N), S, I, J

IF (N .GT. some_threshold) THEN

        DO (parallel) I = 1, N             ! Loop #3
DO J = 1, N ! Loop #5
IF (I .EQ. 1) THEN
S = S + A(I)
ELSE IF (I .EQ. N) THEN
S = S + B(I)
ELSE
S = S + C(I)
ENDIF
ENDDO
ENDDO
ELSE
DO I = 1, N ! Loop #4
DO J = 1, N ! Loop #8
IF (I .EQ. 1) THEN
S = S + A(I)
ELSE IF (I .EQ. N) THEN
S = S + B(I)
ELSE
S = S + C(I)
ENDIF
ENDDO
ENDDO
ENDIF
END
  • Loop#3 contains reductions
    Loop #3 (which was parallelized) also contained one or more reductions. The Reordering Transformation column indicates that the IF statements were promoted out of Loop #5, Loop #8, and Loop #10.

  • Analysis Table lists new loops
    The line numbers of the promoted IF statements are listed. The first test in Loop #5 was promoted, creating two new loops, Loop #6 and Loop #7. Similarly, Loop #8 has a test promoted, creating Loop #9 and Loop #10. The test remaining in Loop #10 is then promoted, thereby creating two additional loops. A promoted test is an IF statement that is hoisted out of a loop. See the section “Test promotion” for more information. The Analysis Table contents are shown below:

    19     5   j     Test on line 21 promoted out of loop
19 8 j Test on line 21 promoted out of loop
19 10 j Test on line 23 promoted out of loop
  • DO loop is not reordered
    The following DO loop does not undergo any reordering transformation:

    6     DO I = 1, 100
7 IA1(I) = I
8 IA2(I) = I * 2
9 IA3(I) = I * 3
10 ENDDO

This fact is reported by the line

    6        1  i         Serial
  • sub1 is cloned
    The call to the subroutine sub1 is cloned. As indicated by the asterisk (*), the compiler produced a new call. The new call is given the ID (3) listed in the New Id Nums column. The new call is then listed, with None indicating that no reordering transformation was performed on the call to the new subroutine.

    14        2  sub1     *Cloned call       (3)
14 3 sub1 None
  • Cloned call is transformed
    The call to the subroutine is then appended to the Loop Report to elaborate on the Cloned call transformation. This line shows that the clone was called in place of the original subroutine.

14    2  sub1  Call target changed to clone 1 of SUB1 (6_e70_cl_sub1)

Example 8-3 Optimization Report

The following Fortran code shows loop blocking, loop peeling, loop distribution, and loop unroll and jam. Line numbers are listed for ease of reference.

1     PROGRAM EXAMPLE200
2
3 REAL*8 A(1000,1000), B(1000,1000), C(1000)
4 REAL*8 D(1000), E(1000)
5 INTEGER M, N
6
7 N = 1000
8 M = 1000
9
10 DO I = 1, N
11 C(I) = 0
12 DO J = 1, M
13 A(I,J) = A(I,J) + B(I,J) * C(I)
14 ENDDO
15 ENDDO
16
17 DO I = 1, N-1
18 D(I) = I
19 ENDDO
20
21 DO J = 1, N
22 E(J) = D(J) + 1
23 ENDDO
24
25 PRINT *, A(103,103), B(517, 517), D(11), E(29)
26
27 END

The following Optimization Report is generated by compiling program EXAMPLE200 as follows:

% f90 +O3 +Oreport +Oloop_block example200.f

            Optimization for example3

Line Id Var Reordering New Optimizing / Special
Num. Num. Name Transformation Id Nums Transformation
-----------------------------------------------------------------------------
10 1 i:1 *Dist (2-3)
10 2 i:1 Serial

10 3 i:1 *Interchange (4) (i:1 j:1) -> (j:1 i:1)
12 4 j:1 *Block (5) (j:1 i:1) -> (i:1 j:1 i:1)
10 5 i:1 *Promote (6-7)
10 6 i:1 Serial Removed
10 7 i:1 Serial
12 8 j:1 *Unroll And Jam (9)
12 9 j:1 *Promote (10-11)
12 10 j:1 Serial Removed
12 11 j:1 Serial
10 12 i:1 Serial

17 13 i:2 Serial Fused
21 14 j:2 *Peel (15)
21 15 j:2 Serial Fused
*Fused (16) (13 15) -> (16)
17 16 i:2 Serial

Line Id Var Analysis
Num. Num. Name
-----------------------------------------------------------------------------
10 5 i:1 Loop blocked by 56 iterations
10 5 i:1 Test on line 12 promoted out of loop
10 6 i:1 Loop blocked by 56 iterations
10 7 i:1 Loop blocked by 56 iterations
12 8 j:1 Loop unrolled by 8 iterations and jammed into the
innermost loop
12 9 j:1 Test on line 10 promoted out of loop
21 14 j:2 Peeled last iteration of loop

The Optimization Report for EXAMPLE200 provides the following results:

    10       1  i:1       *Dist              (2-3)    
  • Several occurrences of variables noted
    In this report, the Var Name column has entries such as i:1, j:1, i:2, and j:2. This type of entry appears when a variable is used more than once. In EXAMPLE200, I is used as an iteration variable twice. Consequently, i:1 refers to the first occurrence, and i:2 refers to the second occurrence.

  • Loop #1 creates new loops
    The first line of the report shows that Loop #1, shown on line 10, is distributed to create Loop #2 and Loop #3:

Initially, Loop #1 appears as shown.

      DO I = 1, N                            ! Loop #1
C(I) = 0
DO J = 1, M
A(I,J) = A(I,J) + B(I,J) * C(I)
ENDDO
ENDDO

It is then distributed as follows:

      DO I = 1, N                            ! Loop #2
C(I) = 0
ENDDO

DO I = 1, N ! Loop #3
DO J = 1, M
A(I,J) = A(I,J) + B(I,J) * C(I)
ENDDO
ENDDO
  • Loop #3 is interchanged to create Loop#4
    The third line indicates this:

10       3  i:1       *Interchange       (4)       (i:1 j:1) ->
(j:1 i:1)

Now, the loop looks like the following code:

      DO J = 1, M                            ! Loop #4
DO I = 1, N
A(I,J) = A(I,J) + B(I,J) * C(I)
ENDDO
ENDDO
  • Nested loop is blocked
    The next line of the Optimization Report indicates that the nest rooted at Loop #4 is blocked:

12       4  j:1       *Block            (5)       (j:1 i:1) ->
(i:1 j:1 i:1)
  • The blocked nest internally appears as follows:

      DO IOUT = 1, N, 56                     ! Loop #5
DO J = 1, M
DO I = IOUT, IOUT + 55
A(I,J) = A(I,J) + B(I,J) * C(I)
ENDDO
ENDDO
ENDDO
  • Loop #5 noted as blocked
    The loop with iteration variable i:1 is the loop that was actually blocked. The report shows *Block on Loop #4 (the j:1 loop) because the entire nest rooted at Loop #4 is replaced by the blocked nest.

  • IOUT variable facilitates loop blocking
    The IOUT variable is introduced to facilitate the loop blocking. The compiler uses a step value of 56 for the IOUT loop as reported in the Analysis Table:

   10       5  i:1       Loop blocked by 56 iterations
  • Test promotion creates new loops
    The next three lines of the report show that a test was promoted out of Loop #5, creating Loop #6 (which is removed) and Loop #7 (which is run serially). This test—which does not appear in the source code—is an implicit test that the compiler inserts in the code to ensure that the loop iterates at least once.

   10       5  i:1       *Promote         (6-7)    
10 6 i:1 Serial Removed
10 7 i:1 Serial
  • This test is referenced again in the following line from the Analysis Table:

   10       5  i:1       Test on line 12 promoted out of loop
  • Unroll and jam creates new loop
    The report indicates that the J is unrolled and jammed, creating Loop #9:

   12       8  j:1       *Unroll And Jam    (9)
  • J loop unrolled by 8 iterations
    This line also indicates that the J loop is unrolled by 8 iterations and fused:

   12     8  j:1     Loop unrolled by 8 iterations and jammed
into the innermost loop
  • The unrolled and jammed loop results in the following code:

      DO IOUT = 1, N, 56                     ! Loop #5
DO J = 1, M, 8 ! Loop #8
DO I = IOUT, IOUT + 55 ! Loop #9
A(I,J) = A(I,J) + B(I,J) * C(I)
A(I,J+1) = A(I,J+1) + B(I,J+1) * C(I)
A(I,J+2) = A(I,J+2) + B(I,J+2) * C(I)
A(I,J+3) = A(I,J+3) + B(I,J+3) * C(I)
A(I,J+4) = A(I,J+4) + B(I,J+4) * C(I)
A(I,J+5) = A(I,J+5) + B(I,J+5) * C(I)
A(I,J+6) = A(I,J+6) + B(I,J+6) * C(I)
A(I,J+7) = A(I,J+7) + B(I,J+7) * C(I)
ENDDO
ENDDO
ENDDO
  • Test promotion in Loop #9 creates new loops
    The Optimization Report indicates that the compiler-inserted test in Loop #9 is promoted out the loop, creating Loop #10 and Loop #11.

   12     9     j:1   *Promote     (10-11)
12 10 j:1 Serial Removed
12 11 j:1 Serial
  • Loops are fused
    According to the report, the last two loops in the program are fused (once an iteration is peeled off the second loop), then the new loop is run serially.

   17      13  i:2        Serial                  Fused
21 14 j:2 *Peel (15)
21 15 j:2 Serial Fused
*Fused (16) (13 15) -> (16)
17 16 i:2 Serial

That information is combined with the following line from the
Analysis Table:

   21       14  j:2        Peeled last iteration of loop
  • Loop peeling creates loop, enables fusion
    Initially, Loop #14 has an iteration peeled to create Loop #15, as shown below. The loop peeling is performed to enable loop fusion.

      DO I = 1, N-1                         ! Loop #13
D(I) = I
ENDDO

DO J = 1, N-1 ! Loop #15
E(J) = D(J) + 1
ENDDO
  • Loops are fused to create new loop
    Loop #13 and Loop #15 are then fused to produce Loop #16:

      DO I = 1, N-1                         ! Loop #16
D(I) = I
E(I) = D(I) + 1
ENDDO
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.