search    
Hewlett-Packard
Optimizing HP aC++ Programs
HP aC++ provides options to the aCC command and pragmas to control optimization. The following sections introduce the basic concepts of optimizing your HP aC++ code for improved efficiency:
Requesting Optimization
By default, the compiler performs constant folding and simple register assignment. There are several ways to increase and control the level of optimization performed on your program.


Setting Basic Optimization Levels

HP aC++ provides four basic levels of optimization, the higher the level the more optimization performed and the longer the optimization takes. You can specify an option on the aCC command line or in the CXXOPTS environment variable.

Example:

aCC -O prog.C

This command compiles prog.C and optimizes the program at the default, level 2.


Level 1 Optimization

Level 1 optimization includes branch optimization, dead code elimination, faster register allocation, instruction scheduling, and peephole (statement-by-statement) optimization. Use +O1 to get level 1 optimization.

Level 1 optimization produces faster programs than without optimization and compiles faster than level 2 optimization. Programs compiled at level 1 can be used with the HP Distributed Debugging Environment (DDE) debugger. Use the debugger option -g0 or -g1.


Level 2 Optimization

Level 2 optimization includes level 1 optimizations, plus optimizations performed over entire functions in a single file. Level 2 optimizes loops in order to reduce pipeline stalls and analyzes data-flow, memory usage, loops, and expressions.

Use -O or +O2 to get level 2 optimization. Level 2 is the default.

Specifically, level 2 provides:

  • Coloring register allocation.
  • Induction variable elimination and strength reduction.
  • Local and global common subexpression elimination.
  • Advanced constant folding and propagation. (Simple constant folding is done by default.)
  • Loop invariant code motion.
  • Store/copy optimization.
  • Unused definition elimination.
  • Software pipelining.
  • Register reassociation.
Level 2 can produce faster run-time code than level 1 if programs use loops extensively. Loop-oriented floating-point intensive applications may see run times reduced by 50%. Operating system and interactive applications that use the already optimized system libraries can achieve 30% to 50% additional improvement. Level 2 optimization produces faster programs than level 1 and compiles faster than level 3 optimization.


Level 3 Optimization

Level 3 optimization includes level 2 optimizations, plus full optimization across all subprograms within a single file. Level 3 also inlines certain subprograms within the input file. Use +O3 to get level 3 optimization.

Level 3 optimization produces faster run-time code than level 2 on code that does many procedure calls to small functions. Level 3 links faster than level 4. But level 3 does not work with the debugger options -g0 and -g1.


Level 4 Optimization

Level 4 optimization includes level 3 optimizations, plus full optimizations across the entire application program. Level 4 includes global and static variable optimization and inlining across the entire program. Optimizations are performed at link time rather than at compile time. Use +O4 to get level 4 optimization.

Level 4 optimization produces faster run-time code than level 3 if programs use many global variables or if there are many opportunities for inlining procedure calls. But level 4 does not work with the debugger options -g0 and -g1.


Additional Options for Finer Control

In addition to basic optimization levels, optimization options are provided should you require a more precise level of control as shown in the following examples:


Enabling Aggressive Optimizations

To enable aggressive optimizations at levels 2, 3 and 4, use the +Oaggressive option as follows:

  • aCC +O2 +Oaggressive sourcefile.C
  • aCC +O3 +Oaggressive sourcefile.C
  • aCC +O4 +Oaggressive sourcefile.C
This option enables additional optimizations at each level.

Note: Use aggressive optimizations with stable, well-structured code. These types of optimizations give you faster code, but may change the behavior of programs.

These optimizations may do any of the following:

  • Relocate conditional floating-point instructions from within loops.
  • Convert certain library calls to millicode and inline instructions.
  • Alter error-handling requirements.

Enabling Only Conservative Optimizations

You can enable only conservative optimizations at levels 2, 3, and 4 by using the +Oconservative option, as follows:

  • aCC +O2 +Oconservative sourcefile.C
  • aCC +O3 +Oconservative sourcefile.C
  • aCC +O4 +Oconservative sourcefile.C
This option disables all but the most conservative optimizations at each level. Conservative optimizations do not change the behavior of code, in most cases, even if the code does not conform to standards.

Use only conservative optimizations provided with level 2, 3, and 4 when your code is unstructured.


Removing Compilation Time Limits When Optimizing

You can remove optimization time restrictions at levels 2, 3 and 4 by using the +Onolimit option as follows:

  • aCC +O2 +Onolimit sourcefile.C
  • aCC +O3 +Onolimit sourcefile.C
  • aCC +O4 +Onolimit sourcefile.C
By default, the optimizer limits the amount of time spent optimizing large programs at levels 2, 3, and 4.

Use this option if longer compile times are acceptable because you want additional optimizations to be performed.


Limiting the Size of Optimized Code

You can disable optimizations that expand code size at levels 2, 3 and 4 by using the +Osize option, as follows:

  • aCC +O2 +Osize sourcefile.C
  • aCC +O3 +Osize sourcefile.C
  • aCC +O4 +Osize sourcefile.C
Most optimizations improve execution speed and decrease executable code size. A few optimizations significantly increase code size to gain execution speed.

The +Osize option disables these code-expanding optimizations.

Use this option if you have limited main memory, swap space, or disk space.


Specifying Maximum Optimization

For maximum optimization, use the +Oall option as follows:

aCC +Oall sourcefile.C
This combination performs aggressive optimizations with unrestricted compile time at the highest level of optimization.

Note: Use +Oall with stable, well-structured code. These types of optimizations give you the fastest code, but are riskier than the default optimizations.

The +Oall option combines the +O4, +Oaggressive, and +Onolimit options.


Combining Optimization Options

Optimization options that affect code size (+Osize), compile-time (+Olimit), and the aggressiveness of the optimizations performed (+Oaggressive or +Oconservative) can be combined at any of the optimization levels 2 through 4.

You can use +Olimit or +Osize with either +Oaggressive or +Oconservative, but you cannot use +Oaggressive with +Oconservative.

Example:
For example, to specify conservative optimizations at level 2 and disable code-expanding optimizations, use the following command:

aCC +O2 +Oconservative +Osize sourcefile.C

Profile-based Optimization

Profile-based optimization (PBO) is a set of performance-improving code transformations based on the runtime characteristics of your application.

The following steps are involved in performing profile-based optimization:

  1. Instrumentation
  2. Data Collection
  3. Maintaining Profile Data Files
  4. Optimization

Instrumentation

To instrument your program, use the +Oprofile=collect option as follows:

  • aCC +Oprofile=collect -O -c sample.C

    (Compile for instrumentation.)

  • aCC +Oprofile=collect -O -o sample.exe sample.o

    (Link to make instrumented executable.)

The first command line uses the -O option to perform level 2 optimization and the +Oprofile=collect option to prepare the code for instrumentation. (+Oprofile=collect generates intermediate code.) The -c option in the first command line suppresses linking and creates an intermediate object file called sample.o. The .o file can be used later in the optimization phase, avoiding a second compile.

The second command line uses the -o option to link sample.o into sample.exe. The +Oprofile=collect option instruments sample.exe with data collection code.

Note: Instrumented programs run slower than non-instrumented programs. Only use instrumented code to collect statistics for profile-based optimization.

Instrumenting Code at Level 4 Optimization:

When optimizing at level 4, (where code generation is delayed until link time), use the +Oprofile=collect option as follows:

  • aCC +Oprofile=collect +O4 -c x.C y.C

    Create intermediate file for instrumentation.

  • aCC +Oprofile=collect +O4 x.o y.o

    Create optimized code with instrumentation.


Data Collection

To collect execution profile statistics, run your instrumented program with representative data as follows:

sample.exe < input.file1   Collect execution profile data.
sample.exe < input.file2   Collect execution profile data.
This step creates and logs the profile statistics to a file, by default called flow.data. The data collection file is a structured file that may be used to store the statistics from multiple test runs of different programs that you may have instrumented.


Maintaining Profile Data Files

Profile-based optimization stores execution profile data in a disk file. By default, this file is called flow.data and is located in your current working directory.

You can override the default name of the profile data file. This is useful when working on large programs or on projects with many different program files.

The FLOW_DATA environment variable can be used to specify the name of the profile data file with either the +Oprofile=collect or +Oprofile=use options.

The +Oprofile=use:filename command line option can be used to specify the name of the profile data file when used with the +Oprofile=use option.

The +Oprofile=use:filename option takes precedence over the FLOW_DATA environment variable.

Examples:

In the following example, the FLOW_DATA environment variable is used to override the flow.data file name. The profile data is stored instead in /users/profiles/prog.data.

export FLOW_DATA=/users/profiles/prog.data
aCC -c +Oprofile=collect +O3 sample.C
aCC -o sample.exe +Oprofile=collect sample.o
sample.exe < input.file1
aCC -o sample.exe +Oprofile=use sample.o
In the next example, the +Oprofile=use option is used to override the flow.data file name with the name /users/profiles/prog.data.
aCC -c +Oprofile=collect +O3 sample.C
aCC -o sample.exe +Oprofile=collect sample.o
sample.exe < input.file1
mv flow.data /users/profile/prog.data
aCC -o sample.exe +Oprofile=use:users/profiles/prog.data +Oprofile=use:sample.o


Optimization

To optimize the program based on the previously collected runtime profile statistics, relink the program as follows:

aCC -o sample.exe +Oprofile=use sample.o
When optimizing at level 4, (where code generation is delayed until link time), use the +Oprofile=use option as follows:
aCC +Oprofile=use +O4  x.o  y.o
When +Oprofile=use is used, no recompilation is necessary. The .o file saved from the instrumentation phase can be used as input.

Note: When using profile-based optimization:

  • Because the linker performs code generation for profile-based optimization, linking object files compiled with +Oprofile=collect and +Oprofile=use takes more time than linking ordinary object files. However, compile-time will be relatively fast. This is because the compiler is only generating the intermediate code.
  • You can compile and instrument in one step, but you will have to recompile again when optimizing. You must use the same options on both compiles, otherwise profile-based optimization cannot be done. For example:
    aCC +Oprofile=collect -O sample.C -o sample.exe  
        // Compile to instrumented executable.
    
    sample.exe < input.file1          
        // Collect execution profile data.
    
    aCC +Oprofile=use -O sample.C -o sample.exe  
        // Recompile with optimization.
    
  • Numerical applications which perform the same calculations independent of the input data will only see a small performance boost.
  • Profile-based optimization has the greatest impact on application performance when used with level 2 or greater optimizations.
  • Profile-based optimization benefits most applications, especially large applications with multiple compilation units, such as compilers, editors, database managers, and user interface managers.
  • Profile-based optimization should be enabled during the final stages of application development. To obtain the best performance, re-profile and re-optimize your application after making source code changes.

Pragmas That Control Optimization
Compiler options provide a high-level, global approach to optimization. To give you more refinement in optimization, HP aC++ provides two pragmas: OPTIMIZE and OPT_LEVEL.

See Optimization Pragmas for more information.