 |
» |
|
|
 |
The Loop Report lists the optimizations that are performed
on loops and calls. If appropriate, the report gives reasons why
a possible optimization was not performed. Loop nests are reported
in the order in which they are encountered and separated by a blank
line. Below is a sample optimization report.  |
Optimization ReportLine Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 3 1 sub1 *Inlined call (2-4) 8 2 iloopi:1 Serial Fused 11 3 jloopi:2 Serial Fused 14 4 kloopi:3 Serial Fused *Fused (5) (2 3 4) -> (5) 8 5 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex Optimization for sub1Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 8 1 iloopi:1 Serial Fused 11 2 jloopi:2 Serial Fused 14 3 kloopi:3 Serial Fused *Fused (4) (1 2 3) -> (4) 8 4 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex |
 |
A description of each column of the Loop Report is shown in Table 8-2 “Loop Report column definitions”. Table 8-2 Loop Report column definitions | Column | Description |
|---|
| Line Num. | Specifies the source line of the beginning
of the loop or of the loop from which it was derived. For cloned
calls and inlined calls, the Line Num. column specifies the source line at which the
call statement appears. | | Id Num. | Specifies a unique ID number for every
optimized loop and for every optimized call. This ID number can
then be referenced by other parts of the report. Both loops appearing
in the original program source and loops created by the compiler
are given loop ID numbers. Loops created by the compiler are also
shown in the New Id Nums column as described later. No distinction between
compiler-generated loops and loops that existed in the original
source is made in the Id Num. column. Loops are assigned unique, sequential
numbers as they are encountered. | | Var Name | Specifies the name of the iteration variable
controlling the loop or the called procedure if the line represents
a call. If the variable is compiler-generated, its name is listed
as *VAR*. If it consists of a truncated variable name followed
by a colon and a number, the number is a reference to the variable
name footnote table, which appears after the Loop Report and Analysis
Table in the Optimization Report. | | Reordering Transformation | Indicates which reordering transformations
were performed. Reordering transformations are performed on loops,
calls, and loop nests, and typically involve reordering and/or duplicating
sections of code to facilitate more efficient execution. This column
has one of the values shown in Table 8-3 “Reordering transformation values in the Loop Report”
. | | New Id Nums | Specifies the ID number for loops or
calls created by the compiler. These ID numbers are listed in the Id Num. column and is referenced in other parts of the
report. However, the loops and calls they represent were not present
in the original source code. In the case of loop fusion, the number
in this column indicates the new loop created by merging all the
fused loops. New ID numbers are also created for cloned calls, inlined
calls, loop blocking, loop distribution, loop interchange, loop
unroll and jam, dynamic selection, and test promotion. | | Optimizing / Special Transformation | Indicates which, if any, optimizing transformations
were performed. An optimizing transformation reduces the number
of operations executed, or replaces operations with simpler operations.
A special transformation allows the compiler to optimize code under
special circumstances. When appropriate, this column has one of
the values shown in Table 8-4 “Optimizing/special transformations values in the Loop Report”
. |
The following values apply to the Reordering Transformation
column described in Table 8-2 “Loop Report column definitions”. Table 8-3 Reordering transformation values in the Loop Report | Value | Description |
|---|
| Block | Loop blocking was performed. The new loop order
is indicated under the Optimizing/Special Transformation column, as shown in Table 8-4 “Optimizing/special transformations values in the Loop Report”. | | Cloned call | A call to a subroutine was cloned. | | Dist | Loop distribution was performed. | | DynSel | Dynamic selection was performed. The numbers
in the New Id Nums column correspond to the loops created. For parallel
loops, these generally include a PARALLEL and a Serial version. | | Fused | The loops were fused into another loop and
no longer exist. The original loops and the new loop is indicated
under the Optimizing/Special Transformation column, as shown in Table 8-4 “Optimizing/special transformations values in the Loop Report”. | | Inlined call | A call to a subroutine was inlined. | | Interchange | Loop interchange was performed. The new loop
order is indicated under the Optimizing/Special Transformation column, as shown in Table 8-4 “Optimizing/special transformations values in the Loop Report”. | | None | No reordering transformation was performed
on the call. | | PARALLEL | The loop runs in thread-parallel mode. | | Peel | The first or last iteration of the loop was
peeled in order to fuse the loop with an adjacent loop. | | Promote | Test promotion was performed. | | Serial | No reordering transformation was performed
on the loop. | | Unroll and Jam | The loop was unrolled and the nested loops
were jammed (fused). | | VECTOR | The loop was fully or partially replaced with
more efficient calls to one or more vector routines. | | * | Appears at left of loop-producing transformation
optimizations (distribution, dynamic selection, blocking, fusion,
interchange, call cloning, call inlining, peeling, promotion, unroll
and jam). |
The following values apply to the Optimizing/special transformations column described in Table 8-2 “Loop Report column definitions”. Table 8-4 Optimizing/special transformations values in the Loop Report | Value | Explanation |
|---|
| Fused | The loop was fused into another loop and no
longer exists. | | Reduction | The compiler recognized a reduction in the
loop. | | Removed | The compiler removed the loop. | | Unrolled | The loop was completely unrolled. | | (OrigOrder) -> (InterchangedOrder) | This information appears when Interchange is reported under Reordering Transformation. OrigOrder indicates the order of loops in the original nest. InterchangedOrder indicates the new order that occurs due to interchange. OrigOrder and InterchangedOrder consist of user iteration variables presented in
outermost to innermost order. | | (OrigLoops)->(NewLoop) | This information appears when Fused is reported under Reordering Transformation. OrigLoops indicates the original
loops that were fused by the compiler to form the loop indicated
by NewLoop. OrigLoops and NewLoop refer
to loops based on the values from the Id Num. and New Id Nums columns in the Loop Report. | | (OrigLoopNest)->(BlockedLoopNest) | This information appears when Block is reported under Reordering Transformation. OrigLoopNest indicates the
order of the original loop nest containing a loop that was blocked. BlockedLoopNest indicates
the order of loops after blocking. OrigLoopNest and BlockedLoopNest refer to
user iteration variables presented in outermost to innermost order. |
Supplemental
tables |  |
The tables described in this section may be included in the Optimization Report
to provide information supplemental to the Loop Report. If necessary, an Analysis Table is included in the Optimization
Report to further elaborate on optimizations reported in the Loop Report. A description of each column in the Analysis Table is shown
in Table 8-5 “Analysis Table column definitions”. Table 8-5 Analysis Table column definitions | Column | Description |
|---|
| Line Num. | Specifies the source line of the beginning
of the loop or call. | | Id Num. | References the ID number assigned to
the loop or call in the Loop Report. | | Var Name | Specifies the name of the iteration variable controlling
the loop, *VAR* (as discussed in the Var Name description in the section “Loop
Report”). | | Analysis | Indicates why a transformation or optimization
was not performed, or additional information on what was done. |
This table reports any user variables contained in a parallelized
loop that are privatized by the compiler. Because the Privatization
Table refers to loops, the Loop Report is automatically provided
with it. A description of each column in the Privatization Table is
shown in Table 8-6 “Privatization Table column definitions”. Table 8-6 Privatization Table column definitions | Column | Definitions |
|---|
| Line Num. | Specifies the source line of the beginning
of the loop. | | Id Num. | References the ID number assigned to
the loop in the loop table. | | Var Name | Specifies the name of the iteration variable controlling
the loop. *VAR* may also appear in this column, as discussed in
the Var Name description in the section “Loop
Report”. | | Priv Var | Specifies the name of the privatized
user variable. Compiler-generated variables that are privatized
are not reported here. | | Privatization Information for Parallel Loops | Provides more detail on the variable privatizations
performed. |
Variable
Name Footnote TableVariable names that are too long to fit in the Var Name columns of the other tables are truncated and
followed by a colon and a footnote number. These footnotes are explained
in the Variable Name Footnote Table. A description of each column in the Variable Name Footnote
Table is shown in Table 8-7 “Variable Name Footnote Table column definitions”. Table 8-7 Variable Name Footnote Table column definitions | Column | Definition |
|---|
| Footnoted Var Name | Specifies the truncated variable name
and its footnote number. | | User Var Name | Specifies the full name of the variable as identified
in the source code. |
Example 8-1 Optimization
Report The following Fortran program is the basis for the Optimization
Report shown in this example. Line numbers are provided for ease
of reference. 1 PROGRAM EXAMPLE99 2 REAL A(100), B(100), C(100) 3 CALL SUB1(A,B,C) 4 END 5 6 SUBROUTINE SUB1(A,B,C) 7 REAL A(100), B(100), C(100) 8 DO ILOOPINDEX=1,100 9 A(ILOOPINDEX) = ILOOPINDEX 10 ENDDO 11 DO JLOOPINDEX=1,100 12 B(JLOOPINDEX) = A(JLOOPINDEX)**2 13 ENDDO 14 DO KLOOPINDEX=1, 100 15 C(KLOOPINDEX) = A(KLOOPINDEX) + B(KLOOPINDEX) 16 ENDDO 17 PRINT *, A(1), B(50), C(100) 18 END |
The following Optimization Report is generated by compiling
the program EXAMPLE99 with the command-line options +O3 +Oparallel +Oreport=all +Oinline=sub1: % f90 +O3 +Oparallel +Oreport=all +Oinline=sub1 EXAMPLE99.f  |
Optimization for EXAMPLE99Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 3 1 sub1 *Inlined call (2-4) 8 2 iloopi:1 Serial Fused 11 3 jloopi:2 Serial Fused 14 4 kloopi:3 Serial Fused *Fused (5) (2 3 4) -> (5) 8 5 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex Optimization for sub1Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 8 1 iloopi:1 Serial Fused 11 2 jloopi:2 Serial Fused 14 3 kloopi:3 Serial Fused *Fused (4) (1 2 3) -> (4) 8 4 iloopi:1 PARALLEL Footnoted User Var Name Var Name ----------------------------------------------------------------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex |
 |
The Optimization Report for EXAMPLE99 provides the following information: Call to sub1 is inlined The
first line of the Loop Report shows that the call to sub1 was inlined, as shown below:
3 1 sub1 *Inlined call (2-4) Three new loops produced The
inlining produced three new loops in EXAMPLE99: Loop #2, Loop #3, and Loop #4. Internally, the EXAMPLE99 module that originally looked like:
1 PROGRAM EXAMPLE99 2 REAL A(100), B(100), C(100) 3 CALL SUB1(A,B,C) 4 END |
now looks like this: PROGRAM EXAMPLE99 REAL A(100), B(100), C(100) DO ILOOPINDEX=1,100 !Loop #2 A(ILOOPINDEX) = ILOOPINDEX ENDDO DO JLOOPINDEX=1,100 !Loop #3 B(JLOOPINDEX) = A(JLOOPINDEX)**2 ENDDO DO KLOOPINDEX=1, 100 !Loop #4 C(KLOOPINDEX) = A(KLOOPINDEX) + B(KLOOPINDEX) ENDDO PRINT *, A(1), B(50), C(100) END |
New loops are fused These
lines indicate that the new loops have been fused. The following
line indicates that the three loops were fused into one new loop, Loop #5. 8 2 iloopi:1 Serial Fused 11 3 jloopi:2 Serial Fused 14 4 kloopi:3 Serial Fused *Fused (5) (2 3 4) (5) |
After fusing, the code internally appears as the following: PROGRAM EXAMPLE99 REAL A(100), B(100), C(100) DO ILOOPINDEX=1,100 !Loop #5 A(ILOOPINDEX) = ILOOPINDEX B(ILOOPINDEX) = A(ILOOPINDEX)**2 C(ILOOPINDEX) = A(ILOOPINDEX) + B(ILOOPINDEX) ENDDO PRINT *, A(1), B(50), C(100) END |
New loop is parallelized In
the following Loop Report line:
8 5 iloopi:1 PARALLEL Loop #5 uses iloopi:1 as the iteration variable, referencing the Variable Name Footnote Table; iloopi:1 corresponds to iloopindex. The same line in the report also indicates that
the newly-created Loop #5 was parallelized. Variable Name Footnote Table lists
iteration variables According to the Variable Name Footnote
Table (duplicated below), the original variable iloopindex is abbreviated by the compiler as iloopi:1 so that it fits into the Var Name columns of other reports. jloopindex and kloopindex are abbreviated as jloopi:2 and kloopi:3, respectively. These names are used throughout
the report to refer to these iteration variables. Footnoted User Var Name Var Name ----------------------- iloopi:1 iloopindex jloopi:2 jloopindex kloopi:3 kloopindex |
Example 8-2 Optimization
Report The following Fortran code provides an example of other transformations the
compiler performs. Line numbers are provided for ease of reference. 1 PROGRAM EXAMPLE100 2 3 INTEGER IA1(100), IA2(100), IA3(100) 4 INTEGER I1, I2 5 6 DO I = 1, 100 7 IA1(I) = I 8 IA2(I) = I * 2 9 IA3(I) = I * 3 10 ENDDO 11 12 I1 = 0 13 I2 = 100 14 CALL SUB1 (IA1, IA2, IA3, I1, I2) 15 END 16 17 SUBROUTINE SUB1(A, B, C, S, N) 18 INTEGER A(N), B(N), C(N), S, I, J 19 DO J = 1, N 20 DO I = 1, N 21 IF (I .EQ. 1) THEN 22 S = S + A(I) 23 ELSE IF (I .EQ. N) THEN 24 S = S + B(I) 25 ELSE 26 S = S + C(I) 27 ENDIF 28 ENDDO 29 ENDDO 30 END |
The following Optimization Report is generated by compiling
the program EXAMPLE100 for parallelization: % f90 +O3 +Oparallel +Oreport=all example100.f Optimization for SUB1 Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 19 1 j *Interchange (2) (j i) -> (i j) 20 2 i *DynSel (3-4) 20 3 i PARALLEL Reduction 19 5 j *Promote (6-7) 19 6 j Serial 19 7 j Serial 20 4 i Serial 19 8 j *Promote (9-10) 19 9 j Serial 19 10 j *Promote (11-12) 19 11 j Serial 19 12 j Serial Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 19 5 j Test on line 21 promoted out of loop 19 8 j Test on line 21 promoted out of loop 19 10 j Test on line 23 promoted out of loop |
 |
 |
Optimization for clone 1 of SUB1 (6_e70_cl_sub1) Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 19 1 j *Interchange (2) (j i) -> (i j) 20 2 i PARALLEL Reduction 19 3 j *Promote (4-5) 19 4 j Serial 19 5 j *Promote (6-7) 19 6 j Serial 19 7 j Serial Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 19 3 j Test on line 21 promoted out of loop 19 5 j Test on line 23 promoted out of loop Optimization for example100 Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 6 1 i Serial 14 2 sub1 *Cloned call (3) 14 3 sub1 None Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 14 2 sub1 Call target changed to clone 1 of SUB1 (6_e70_cl_sub1) |
 |
The Optimization Report for EXAMPLE100 shows Optimization
Reports for the subroutine and its clone, followed by the optimizations
to the subroutine. It includes the following information: Original subroutine contents Originally,
the subroutine appeared as shown below:
17 SUBROUTINE SUB1(A, B, C, S, N) 18 INTEGER A(N), B(N), C(N), S, I, J 19 DO J = 1, N 20 DO I = 1, N 21 IF (I .EQ. 1) THEN 22 S = S + A(I) 23 ELSE IF (I .EQ. N) THEN 24 S = S + B(I) 25 ELSE 26 S = S + C(I) 27 ENDIF 28 ENDDO 29 ENDDO 30 END |
Loop interchange performed first The
compiler first performs loop interchange (listed as Interchange in the report) to maximize cache performance:
19 1 j *Interchange (2) (j i) -> (i j) |
The subroutine then becomes the following
17 SUBROUTINE SUB1(A, B, C, S, N) 18 INTEGER A(N), B(N), C(N), S, I, J 19 DO I = 1, N ! Loop #2 20 DO J = 1, N ! Loop #1 21 IF (I .EQ. 1) THEN 22 S = S + A(I) 23 ELSE IF (I .EQ. N) THEN 24 S = S + B(I) 25 ELSE 26 S = S + C(I) 27 ENDIF 28 ENDDO 29 ENDDO 30 END |
The program is optimized for parallelization The
compiler would like to parallelize the outermost loop in the nest, which
is now the I loop. However because the value of N is not known, the compiler does not know how many
times the I loop needs to be executed. To ensure
that the loop is executed as efficiently as possible at runtime,
the compiler replaces the I loop nest with two new copies of the I loop nest, one to be run in parallel,
the other to be run serially.
Dynamic selection is executed An IF is then inserted to select the more efficient
version of the loop to execute at runtime. This method of making
one copy for parallel execution and one copy for serial execution
is known as dynamic selection, which is enabled by default
when +O3 +Oparallel is specified (see “Dynamic
selection” for more information). This optimization is
reported in the Loop Report in the line:
Loop#2 creates two loops According to the report, Loop #2 was used to create the new loops, Loop #3 and Loop #4. Internally, the code now is represented as follows:
SUBROUTINE SUB1(A, B, C, S, N) INTEGER A(N), B(N), C(N), S, I, J |
IF (N .GT. some_threshold) THEN DO (parallel) I = 1, N ! Loop #3 DO J = 1, N ! Loop #5 IF (I .EQ. 1) THEN S = S + A(I) ELSE IF (I .EQ. N) THEN S = S + B(I) ELSE S = S + C(I) ENDIF ENDDO ENDDO ELSE DO I = 1, N ! Loop #4 DO J = 1, N ! Loop #8 IF (I .EQ. 1) THEN S = S + A(I) ELSE IF (I .EQ. N) THEN S = S + B(I) ELSE S = S + C(I) ENDIF ENDDO ENDDO ENDIF END |
Loop#3 contains reductions Loop #3 (which was parallelized) also contained one or
more reductions. The Reordering Transformation column indicates that the IF statements were promoted out of Loop #5, Loop #8, and Loop #10.
Analysis Table lists new loops The
line numbers of the promoted IF statements are listed. The first test in Loop #5 was promoted, creating two new loops, Loop #6 and Loop #7. Similarly, Loop #8 has a test promoted, creating Loop #9 and Loop #10. The test remaining in Loop #10 is then promoted, thereby creating two additional
loops. A promoted test is an IF statement that is hoisted out of a loop. See the
section “Test
promotion” for
more information. The Analysis Table contents are shown below:
19 5 j Test on line 21 promoted out of loop 19 8 j Test on line 21 promoted out of loop 19 10 j Test on line 23 promoted out of loop |
DO loop is not reordered The
following DO loop does not undergo any reordering transformation:
6 DO I = 1, 100 7 IA1(I) = I 8 IA2(I) = I * 2 9 IA3(I) = I * 3 10 ENDDO |
This fact is reported by the line sub1 is cloned The call to the subroutine sub1 is cloned. As indicated by the asterisk (*), the compiler produced a new call. The new call
is given the ID (3) listed in the New Id Nums column. The new call is then listed, with None indicating that no reordering transformation was performed
on the call to the new subroutine.
14 2 sub1 *Cloned call (3) 14 3 sub1 None |
Cloned call is transformed The call to the subroutine
is then appended to the Loop Report to elaborate on the Cloned call transformation. This line shows that the clone
was called in place of the original subroutine.
14 2 sub1 Call target changed to clone 1 of SUB1 (6_e70_cl_sub1) |
Example 8-3 Optimization
Report The following Fortran code shows loop blocking, loop peeling,
loop distribution, and loop unroll and jam. Line numbers are listed
for ease of reference. 1 PROGRAM EXAMPLE200 2 3 REAL*8 A(1000,1000), B(1000,1000), C(1000) 4 REAL*8 D(1000), E(1000) 5 INTEGER M, N 6 7 N = 1000 8 M = 1000 9 10 DO I = 1, N 11 C(I) = 0 12 DO J = 1, M 13 A(I,J) = A(I,J) + B(I,J) * C(I) 14 ENDDO 15 ENDDO 16 17 DO I = 1, N-1 18 D(I) = I 19 ENDDO 20 21 DO J = 1, N 22 E(J) = D(J) + 1 23 ENDDO 24 25 PRINT *, A(103,103), B(517, 517), D(11), E(29) 26 27 END |
The following Optimization Report is generated by compiling
program EXAMPLE200 as follows: % f90 +O3 +Oreport +Oloop_block example200.f  |
Optimization for example3 Line Id Var Reordering New Optimizing / Special Num. Num. Name Transformation Id Nums Transformation ----------------------------------------------------------------------------- 10 1 i:1 *Dist (2-3) 10 2 i:1 Serial 10 3 i:1 *Interchange (4) (i:1 j:1) -> (j:1 i:1) 12 4 j:1 *Block (5) (j:1 i:1) -> (i:1 j:1 i:1) 10 5 i:1 *Promote (6-7) 10 6 i:1 Serial Removed 10 7 i:1 Serial 12 8 j:1 *Unroll And Jam (9) 12 9 j:1 *Promote (10-11) 12 10 j:1 Serial Removed 12 11 j:1 Serial 10 12 i:1 Serial 17 13 i:2 Serial Fused 21 14 j:2 *Peel (15) 21 15 j:2 Serial Fused *Fused (16) (13 15) -> (16) 17 16 i:2 Serial Line Id Var Analysis Num. Num. Name ----------------------------------------------------------------------------- 10 5 i:1 Loop blocked by 56 iterations 10 5 i:1 Test on line 12 promoted out of loop 10 6 i:1 Loop blocked by 56 iterations 10 7 i:1 Loop blocked by 56 iterations 12 8 j:1 Loop unrolled by 8 iterations and jammed into the innermost loop 12 9 j:1 Test on line 10 promoted out of loop 21 14 j:2 Peeled last iteration of loop
|
 |
The Optimization Report for EXAMPLE200 provides the following results: Several occurrences of variables noted In
this report, the Var Name column has entries such as i:1, j:1, i:2, and j:2. This type of entry appears when a variable is
used more than once. In EXAMPLE200, I is used as an iteration variable twice. Consequently, i:1 refers to the first occurrence, and i:2 refers to the second occurrence.
Loop #1 creates new loops The first line of the
report shows that Loop #1, shown on line 10, is distributed to create Loop #2 and Loop #3:
Initially, Loop #1 appears as shown. DO I = 1, N ! Loop #1 C(I) = 0 DO J = 1, M A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
It is then distributed as follows: DO I = 1, N ! Loop #2 C(I) = 0 ENDDO DO I = 1, N ! Loop #3 DO J = 1, M A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
Loop #3 is interchanged to create Loop#4 The third line indicates this:
10 3 i:1 *Interchange (4) (i:1 j:1) -> (j:1 i:1) |
Now, the loop looks like the following code: DO J = 1, M ! Loop #4 DO I = 1, N A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO |
Nested loop is blocked The
next line of the Optimization Report indicates that the nest rooted
at Loop #4 is blocked:
12 4 j:1 *Block (5) (j:1 i:1) -> (i:1 j:1 i:1) |
The blocked nest internally appears
as follows:
DO IOUT = 1, N, 56 ! Loop #5 DO J = 1, M DO I = IOUT, IOUT + 55 A(I,J) = A(I,J) + B(I,J) * C(I) ENDDO ENDDO ENDDO |
Loop #5 noted as blocked The loop with iteration
variable i:1 is the loop that was actually blocked. The report
shows *Block on Loop #4 (the j:1 loop) because the entire nest rooted at Loop #4 is replaced by the blocked nest.
IOUT variable facilitates loop blocking The IOUT variable is introduced to facilitate the loop
blocking. The compiler uses a step value of 56 for the IOUT loop as reported in the Analysis Table:
10 5 i:1 Loop blocked by 56 iterations |
Test promotion creates new loops The
next three lines of the report show that a test was promoted out of Loop #5, creating Loop #6 (which is removed) and Loop #7 (which is run serially). This test—which
does not appear in the source code—is an implicit test
that the compiler inserts in the code to ensure that the loop iterates
at least once.
10 5 i:1 *Promote (6-7) 10 6 i:1 Serial Removed 10 7 i:1 Serial |
This test is referenced again in the
following line from the Analysis Table:
10 5 i:1 Test on line 12 promoted out of loop |
Unroll and jam creates new loop The
report indicates that the J is unrolled and jammed, creating Loop #9:
12 8 j:1 *Unroll And Jam (9) |
J loop unrolled by 8 iterations This line
also indicates that the J loop is unrolled by 8 iterations and fused:
12 8 j:1 Loop unrolled by 8 iterations and jammed into the innermost loop |
The unrolled and jammed loop results
in the following code:
DO IOUT = 1, N, 56 ! Loop #5 DO J = 1, M, 8 ! Loop #8 DO I = IOUT, IOUT + 55 ! Loop #9 A(I,J) = A(I,J) + B(I,J) * C(I) A(I,J+1) = A(I,J+1) + B(I,J+1) * C(I) A(I,J+2) = A(I,J+2) + B(I,J+2) * C(I) A(I,J+3) = A(I,J+3) + B(I,J+3) * C(I) A(I,J+4) = A(I,J+4) + B(I,J+4) * C(I) A(I,J+5) = A(I,J+5) + B(I,J+5) * C(I) A(I,J+6) = A(I,J+6) + B(I,J+6) * C(I) A(I,J+7) = A(I,J+7) + B(I,J+7) * C(I) ENDDO ENDDO ENDDO |
Test promotion in Loop #9 creates new loops The Optimization Report
indicates that the compiler-inserted test in Loop #9 is promoted out the loop, creating Loop #10 and Loop #11.
12 9 j:1 *Promote (10-11) 12 10 j:1 Serial Removed 12 11 j:1 Serial |
Loops are fused According
to the report, the last two loops in the program are fused (once
an iteration is peeled off the second loop), then the new loop is run
serially.
17 13 i:2 Serial Fused 21 14 j:2 *Peel (15) 21 15 j:2 Serial Fused *Fused (16) (13 15) -> (16) 17 16 i:2 Serial |
That information is combined with the following line from
the Analysis Table: 21 14 j:2 Peeled last iteration of loop |
Loop peeling creates loop, enables
fusion Initially, Loop #14 has an iteration peeled to create Loop #15, as shown below. The loop peeling is performed
to enable loop fusion.
DO I = 1, N-1 ! Loop #13 D(I) = I ENDDO DO J = 1, N-1 ! Loop #15 E(J) = D(J) + 1 ENDDO |
Loops are fused to create new loop Loop #13 and Loop #15 are then fused to produce Loop #16:
DO I = 1, N-1 ! Loop #16 D(I) = I E(I) = D(I) + 1 ENDDO |
|