Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP Itanium-based Systems: HP aC++/HP C Programmer's Guide > Chapter 7 Optimizing HP aC++ Programs

Requesting Optimization

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

By default, the compiler performs constant folding and simple register assignment. There are several ways to increase and control the level of optimization performed on your program.

Setting Basic Optimization Levels

HP aC++ provides four basic levels of optimization, the higher the level the more optimization performed and the longer the optimization takes.

You can specify an option on the aCC command line or in the CXXOPTS environment variable.

Example:

aCC -O prog.C

Compiles prog.C and optimizes the program at the default level 1.

Level 1 Optimization

Level 1 optimization includes branch optimization, dead code elimination, faster register allocation, instruction scheduling, and peephole (statement-by-statement) optimization. Use +O1 to get level 1 optimization. Level 1 is the default.

Level 1 optimization produces faster programs than without optimization and compiles faster than level 2 optimization. Programs compiled at level 1 can be used with the HP Distributed Debugging Environment (DDE) debugger. Use the debugger option -g0 or -g1.

Level 2 Optimization

Level 2 optimization includes level 1 optimization, along with optimizations performed over entire functions in a single file. Level 2 optimizes loops in order to reduce pipeline stalls and analyzes data-flow, memory usage, loops, and expressions. Use -O or +O2 to get level 2 optimization.

Specifically, level 2 provides the following:

  • Coloring register allocation.

  • Induction variable elimination and strength reduction.

  • Local and global common subexpression elimination.

  • Advanced constant folding and propagation. (Simple constant folding is done by default.)

  • Loop invariant code motion.

  • Store/copy optimization.

  • Unused definition elimination.

  • Software pipelining.

  • Register reassociation.

Level 2 can produce faster runtime code than level 1 if programs use loops extensively. Loop-oriented floating-point intensive applications may see run times reduced by 50%.

Operating system and interactive applications that use the already optimized system libraries can achieve 30% to 50% additional improvement. Level 2 optimization produces faster programs than level 1 and compiles faster than level 3 optimization.

Level 3 Optimization

Level 3 optimization includes level 2 optimizations, along with full optimization across all subprograms within a single file. Level 3 also inlines certain subprograms within the input file. Use +O3 to get level 3 optimization.

Level 3 optimization produces faster runtime code than level 2 on code that does many procedure calls to small functions. Level 3 links faster than level 4. But level 3 does not work with the debugger options -g0 and -g1.

Level 4 Optimization

Level 4 optimization includes level 3 optimizations, along with full optimizations across the entire application program. Level 4 includes global and static variable optimization and inlining across the entire program. Optimizations are performed at link time rather than at compile time. Use +O4 to get level 4 optimization.

Level 4 optimization produces faster runtime code than level 3 if programs use many global variables or if there are many opportunities for inlining procedure calls. But level 4 does not work with the debugger options -g0 and -g1.

Additional Options for Finer Control

In addition to basic optimization levels, optimization options are provided should you require a more precise level of control.

Some introductory examples follow:

Enabling Aggressive Optimizations

To enable aggressive optimizations at the second, third, or fourth optimization levels, use the +Ofast option as follows:

aCC +Ofast +O2 sourcefile.C

or:

aCC +Ofast +O3 sourcefile.C

or:

aCC +Ofast +O4 sourcefile.C

This option enables additional optimizations at each level.

NOTE: Use aggressive optimizations with stable, well-structured code. These types of optimizations give you faster code, but may change the behavior of programs.

These optimizations may do any of the following:

  • Relocate conditional floating-point instructions from within loops

  • Convert certain library calls to millicode and inline instructions

  • Alter error-handling requirements

Enabling Only Conservative Optimizations

You can enable only conservative optimizations at the second, third, or fourth optimization levels by using the +Ofltacc=strict +Ofenvaccess option, as follows:

aCC +O2 +Ofltacc=strict +Ofenvaccess sourcefile.C

or:

aCC +O3 +Ofltacc=strict +Ofenvaccess sourcefile.C

or:

aCC +O4 +Ofltacc=strict +Ofenvaccess sourcefile.C

This option disables all but the most conservative optimizations at each level. Conservative optimizations do not change the behavior of code, in most cases, even if the code does not conform to standards.

Use only conservative optimizations provided with level 2, 3, and 4 when your code is unstructured.

Removing Compilation Time Limits When Optimizing

You can remove optimization time restrictions at the second, third, or fourth optimization levels by using the +Onolimit option as follows:

aCC +O2 +Onolimit sourcefile.C

or:

aCC +O3 +Onolimit sourcefile.C

or:

aCC +O4 +Onolimit sourcefile.C

By default, the optimizer limits the amount of time spent optimizing large programs at levels 2, 3, and 4. Use this option if longer compile times are acceptable because you want additional optimizations to be performed.

Limiting the Size of Optimized Code

You can disable optimizations that expand code size at the second, third, and fourth optimization levels by using the +Osize suboption, as follows:

aCC +O2 +Osize sourcefile.C

or:

aCC +O3 +Osize sourcefile.C

or:

aCC +O4 +Osize sourcefile.C

Most optimizations improve execution speed and decrease executable code size. A few optimizations significantly increase code size to gain execution speed. The +Osize option disables these code-expanding optimizations.

Use this option if you have limited main memory, swap space, or disk space.

Combining Optimization Options

Optimization options that affect code size, (+Osize), compile-time (+Olimit), and the aggressiveness of the optimizations performed can be combined at any of the optimization levels 2 through 4.

Profile-Based Optimization

Profile-based optimization (PBO) is a set of performance-improving code transformations based on the runtime characteristics of your application.

When using profile-based optimization, please note the following:

  • Because the linker performs code generation for profile-based optimization, linking object files compiled with +Oprofile=collect and +Oprofile=use takes more time than linking ordinary object files. However, compile-time will be relatively fast. This is because the compiler is only generating the intermediate code.

  • You can compile and instrument in one step, but you will have to recompile again when optimizing. For example:

    aCC +Oprofile=collect -O sample.C -o sample.exe
                                // Compile to instrumented executable.

    sample.exe < input.file     // Collect execution profile data.

    aCC +Oprofile=use -O sample.C -o sample.exe
                                 // Recompile with optimization.

  • Numerical applications which perform the same calculations independent of the input data will only see a small performance boost.

  • Profile-based optimization has the greatest impact on application performance when used with level 2 or greater optimizations.

  • Profile-based optimization benefits most applications, especially large applications with multiple compilation units, such as compilers, editors, database managers, and user interface managers.

  • Profile-based optimization should be enabled during the final stages of application development. To obtain the best performance, reprofile and reoptimize your application after making source code changes.

There are three steps involved in performing profile-based optimization:

Instrumentation

To instrument your program, use the +Oprofile=collect option as follows:

aCC +Oprofile=collect -O -c sample.C

aCC +Oprofile=collect -O -o sample.exe sample.o

The first command line uses the -O option to perform level 2 optimization and the +Oprofile=collect option to prepare the code for instrumentation. (+Oprofile=collect generates intermediate code.) The -c option in the first command line suppresses linking and creates an intermediate object file called sample.o. The .o file can be used later in the optimization phase, avoiding a second compile.

The second command line uses the -o option to link sample.o into sample.exe. The +Oprofile=collect option instruments sample.exe with data collection code.

NOTE: Instrumented programs run slower than non-instrumented programs. Only use instrumented code to collect statistics for profile-based optimization.
Instrumenting Code at Level 4 Optimization

When optimizing at level 4, (where code generation is delayed until link time), use the +Oprofile=collect option as follows:

aCC +Oprofile=collect +O4 -c x.C y.C

aCC +Oprofile=collect +O4 x.o y.o

The first line creates an intermediate file for instrumentation. The second line creates optimized code with instrumentation.

Collecting Data for Profiling

To collect execution profile statistics, run your instrumented program with representative data as follows:

sample.exe < input.file1

sample.exe < input.file2

This step creates and logs the profile statistics to a file, by default called flow.data. The data collection file is a structured file that may be used to store the statistics from multiple test runs of different programs that you may have instrumented.

Maintaining Profile Data Files

Profile-based optimization stores execution profile data in a disk file. By default, this file is called flow.data and is located in your current working directory.

You can override the default name of the profile data file. This is useful when working on large programs or on projects with many different program files.

The FLOW_DATA environment variable can be used to specify the name of the profile data file with either the +Oprofile=collect or +Oprofile=use options.

Example 1

In the following example, the FLOW_DATA environment variable is used to override the flow.data file name. The profile data is stored instead in /users/profiles/prog.data.

export FLOW_DATA=/users/profiles/prog.data
aCC -c +Oprofile=collect sample.C
aCC -o sample.exe +Oprofile=collect sample.o
sample.exe < input.file1
aCC -o sample.exe +Oprofile=use +O3 sample.o
Example 2

In this example, the +Oprofile=use:filename option is used to override the flow.data file name with the name /users/profiles/prog.data.

aCC -c +Oprofile=collect sample.C
aCC -o sample.exe +Oprofile=collect sample.o
sample.exe < input.file1
mv flow.data /users/profile/prog.data
aCC -o sample.exe +Oprofile=use:/users/profiles/prog.data +O3 sample.o

Performing Profile-Based Optimization

To optimize the program based on the previously collected runtime profile statistics, relink the program as follows:

aCC -o sample.exe +Oprofile=use sample.o

When optimizing at level 4, (where code generation is delayed until link time), use the +Oprofile=use option as follows:

aCC +Oprofile=use +O4 x.o y.o

When +Oprofile=use is used, no recompilation is necessary. The .o file saved from the instrumentation phase can be used as input.

For more information on profile-based optimization, you can refer to the HP-UX Online Linker and Libraries User’s Guide.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.