|
By default, the compiler performs constant folding and simple
register assignment. There are several ways to increase and
control the level of optimization performed on your program.
Setting Basic Optimization Levels
HP aC++ provides four basic levels of optimization, the higher the
level the more optimization performed and the longer the optimization
takes. You can specify an option on the aCC command line or in the
CXXOPTS environment variable.
Example:
aCC -O prog.C
This command compiles prog.C and optimizes the program at the default, level 2.
Level 1 Optimization
Level 1 optimization includes branch optimization, dead code
elimination, faster register allocation, instruction scheduling, and
peephole (statement-by-statement) optimization. Use +O1 to get level
1 optimization.
Level 1 optimization produces faster programs than without
optimization and compiles faster than level 2 optimization.
Programs compiled at level 1 can be used with the HP
Distributed Debugging Environment (DDE) debugger. Use the debugger
option -g0 or -g1.
Level 2 Optimization
Level 2 optimization includes level 1 optimizations, plus
optimizations performed over entire functions in a single file.
Level 2 optimizes loops in order to reduce pipeline stalls and
analyzes data-flow, memory usage, loops, and expressions.
Use -O or +O2 to get level 2 optimization. Level 2 is the default.
Specifically, level 2 provides:
- Coloring register allocation.
- Induction variable elimination and strength reduction.
- Local and global common subexpression elimination.
- Advanced constant folding and propagation. (Simple constant folding is done by default.)
- Loop invariant code motion.
- Store/copy optimization.
- Unused definition elimination.
- Software pipelining.
- Register reassociation.
Level 2 can produce faster run-time code than level 1 if
programs use loops extensively. Loop-oriented floating-point
intensive applications may see run times reduced by 50%.
Operating system and interactive applications that use the
already optimized system libraries can achieve 30% to 50%
additional improvement. Level 2 optimization produces faster
programs than level 1 and compiles faster than level 3
optimization.
Level 3 Optimization
Level 3 optimization includes level 2 optimizations, plus
full optimization across all subprograms within a single
file. Level 3 also inlines certain subprograms within the
input file. Use +O3 to get level 3 optimization.
Level 3 optimization produces faster run-time code than level
2 on code that does many procedure calls to small functions.
Level 3 links faster than level 4. But level 3 does not work
with the debugger options -g0 and -g1.
Level 4 Optimization
Level 4 optimization includes level 3 optimizations, plus
full optimizations across the entire application program.
Level 4 includes global and static variable optimization and
inlining across the entire program. Optimizations are performed
at link time rather than at compile time. Use +O4 to get level
4 optimization.
Level 4 optimization produces faster run-time code than level
3 if programs use many global variables or if there are many
opportunities for inlining procedure calls. But level 4 does
not work with the debugger options -g0 and -g1.
Additional Options for Finer Control
In addition to basic optimization levels, optimization options are provided
should you require a more precise level of control as shown in the following examples:
Enabling Aggressive Optimizations
To enable aggressive optimizations at levels 2, 3 and 4, use the +Oaggressive option as follows:
- aCC +O2 +Oaggressive sourcefile.C
- aCC +O3 +Oaggressive sourcefile.C
- aCC +O4 +Oaggressive sourcefile.C
This option enables additional optimizations at each level.
Note: Use aggressive optimizations with stable,
well-structured code. These types of optimizations give you faster
code, but may change the behavior of programs.
These optimizations may do any of the following:
- Relocate conditional floating-point instructions from within loops.
- Convert certain library calls to millicode and inline instructions.
- Alter error-handling requirements.
Enabling Only Conservative Optimizations
You can enable only conservative optimizations at levels 2, 3, and 4
by using the +Oconservative option, as follows:
- aCC +O2 +Oconservative sourcefile.C
- aCC +O3 +Oconservative sourcefile.C
- aCC +O4 +Oconservative sourcefile.C
This option disables all but the most conservative optimizations
at each level. Conservative optimizations do not change the
behavior of code, in most cases, even if the code does not conform to standards.
Use only conservative optimizations provided with level 2, 3, and 4 when
your code is unstructured.
Removing Compilation Time Limits When Optimizing
You can remove optimization time restrictions at levels 2, 3 and 4 by using the
+Onolimit option as follows:
- aCC +O2 +Onolimit sourcefile.C
- aCC +O3 +Onolimit sourcefile.C
- aCC +O4 +Onolimit sourcefile.C
By default, the optimizer limits the amount of time spent
optimizing large programs at levels 2, 3, and 4.
Use this option if longer compile times are acceptable because you
want additional optimizations to be performed.
Limiting the Size of Optimized Code
You can disable optimizations that expand code size at levels 2, 3 and 4 by using the
+Osize option, as follows:
- aCC +O2 +Osize sourcefile.C
- aCC +O3 +Osize sourcefile.C
- aCC +O4 +Osize sourcefile.C
Most optimizations improve execution speed and decrease
executable code size. A few optimizations significantly
increase code size to gain execution speed.
The +Osize option disables these code-expanding optimizations.
Use this option if you have limited main memory, swap space, or disk space.
Specifying Maximum Optimization
For maximum optimization, use the +Oall option as follows:
aCC +Oall sourcefile.C
This combination performs aggressive optimizations with unrestricted
compile time at the highest level of optimization.
Note: Use +Oall with stable, well-structured code. These
types of optimizations give you the fastest code, but are
riskier than the default optimizations.
The +Oall option combines the +O4, +Oaggressive, and +Onolimit options.
Combining Optimization Options
Optimization options that affect code size (+Osize),
compile-time (+Olimit), and the aggressiveness of the
optimizations performed (+Oaggressive or +Oconservative)
can be combined at any of the optimization levels 2 through 4.
You can use +Olimit or +Osize with either
+Oaggressive or +Oconservative, but you
cannot use +Oaggressive with +Oconservative.
Example:
For example, to specify conservative optimizations at
level 2 and disable code-expanding optimizations, use the following
command:
aCC +O2 +Oconservative +Osize sourcefile.C
Profile-based Optimization
Profile-based optimization (PBO) is a set of performance-improving code transformations
based on the runtime characteristics of your application.
The following steps are involved in performing profile-based optimization:
- Instrumentation
- Data Collection
- Maintaining Profile Data Files
- Optimization
Instrumentation
To instrument your program, use the +Oprofile=collect option as follows:
The first command line uses the -O option to perform level
2 optimization and the +Oprofile=collect option to prepare the code for
instrumentation. (+Oprofile=collect generates intermediate
code.) The -c option in the first command line
suppresses linking and creates an intermediate object
file called sample.o. The .o file can be used
later in the optimization phase, avoiding a second compile.
The second command line uses the -o option to link
sample.o into sample.exe. The +Oprofile=collect
option instruments sample.exe with data collection code.
Note: Instrumented programs run slower than non-instrumented
programs. Only use instrumented code to collect statistics for
profile-based optimization.
Instrumenting Code at Level 4 Optimization:
When optimizing at level 4, (where code generation is delayed until
link time), use the +Oprofile=collect option as follows:
Data Collection
To collect execution profile statistics, run your instrumented
program with representative data as follows:
sample.exe < input.file1 Collect execution profile data.
sample.exe < input.file2 Collect execution profile data.
This step creates and logs the profile statistics to a file,
by default called flow.data. The data collection file
is a structured file that may be used to store the statistics
from multiple test runs of different programs that you may
have instrumented.
Maintaining Profile Data Files
Profile-based optimization stores execution profile data
in a disk file. By default, this file is called flow.data
and is located in your current working directory.
You can override the default name of the profile data file.
This is useful when working on large programs or on
projects with many different program files.
The FLOW_DATA environment variable can be used to specify
the name of the profile data file with either the +Oprofile=collect
or +Oprofile=use options.
The +Oprofile=use:filename command line option can be used to specify the name
of the profile data file when used with the +Oprofile=use option.
The +Oprofile=use:filename option takes precedence
over the FLOW_DATA environment variable.
Examples:
In the following example, the FLOW_DATA environment variable
is used to override the flow.data file name. The profile data
is stored instead in /users/profiles/prog.data.
export FLOW_DATA=/users/profiles/prog.data
aCC -c +Oprofile=collect +O3 sample.C
aCC -o sample.exe +Oprofile=collect sample.o
sample.exe < input.file1
aCC -o sample.exe +Oprofile=use sample.o
In the next example, the +Oprofile=use option is used to override the
flow.data file name with the name /users/profiles/prog.data.
aCC -c +Oprofile=collect +O3 sample.C
aCC -o sample.exe +Oprofile=collect sample.o
sample.exe < input.file1
mv flow.data /users/profile/prog.data
aCC -o sample.exe +Oprofile=use:users/profiles/prog.data +Oprofile=use:sample.o
Optimization
To optimize the program based on the previously collected
runtime profile statistics, relink the program as follows:
aCC -o sample.exe +Oprofile=use sample.o
When optimizing at level 4, (where code generation is delayed until link time),
use the +Oprofile=use option as follows:
aCC +Oprofile=use +O4 x.o y.o
When +Oprofile=use is used, no recompilation is
necessary. The .o file saved from the instrumentation
phase can be used as input.
Note: When using profile-based optimization:
|