| United States-English |
|
|
|
![]() |
HP Itanium-based Systems: HP aC++/HP C Programmer's Guide > Chapter 7 Optimizing HP aC++
ProgramsRequesting Optimization |
|
By default, the compiler performs constant folding and simple register assignment. There are several ways to increase and control the level of optimization performed on your program. HP aC++ provides four basic levels of optimization, the higher the level the more optimization performed and the longer the optimization takes. You can specify an option on the aCC command line or in the CXXOPTS environment variable. Example: aCC -O prog.C Compiles prog.C and optimizes the program at the default level 1. Level 1 optimization includes branch optimization, dead code elimination, faster register allocation, instruction scheduling, and peephole (statement-by-statement) optimization. Use +O1 to get level 1 optimization. Level 1 is the default. Level 1 optimization produces faster programs than without optimization and compiles faster than level 2 optimization. Programs compiled at level 1 can be used with the HP Distributed Debugging Environment (DDE) debugger. Use the debugger option -g0 or -g1. Level 2 optimization includes level 1 optimization, along with optimizations performed over entire functions in a single file. Level 2 optimizes loops in order to reduce pipeline stalls and analyzes data-flow, memory usage, loops, and expressions. Use -O or +O2 to get level 2 optimization. Specifically, level 2 provides the following:
Level 2 can produce faster runtime code than level 1 if programs use loops extensively. Loop-oriented floating-point intensive applications may see run times reduced by 50%. Operating system and interactive applications that use the already optimized system libraries can achieve 30% to 50% additional improvement. Level 2 optimization produces faster programs than level 1 and compiles faster than level 3 optimization. Level 3 optimization includes level 2 optimizations, along with full optimization across all subprograms within a single file. Level 3 also inlines certain subprograms within the input file. Use +O3 to get level 3 optimization. Level 3 optimization produces faster runtime code than level 2 on code that does many procedure calls to small functions. Level 3 links faster than level 4. But level 3 does not work with the debugger options -g0 and -g1. Level 4 optimization includes level 3 optimizations, along with full optimizations across the entire application program. Level 4 includes global and static variable optimization and inlining across the entire program. Optimizations are performed at link time rather than at compile time. Use +O4 to get level 4 optimization. Level 4 optimization produces faster runtime code than level 3 if programs use many global variables or if there are many opportunities for inlining procedure calls. But level 4 does not work with the debugger options -g0 and -g1. In addition to basic optimization levels, optimization options are provided should you require a more precise level of control. Some introductory examples follow: To enable aggressive optimizations at the second, third, or fourth optimization levels, use the +Ofast option as follows: aCC +Ofast +O2 sourcefile.C or: aCC +Ofast +O3 sourcefile.C or: aCC +Ofast +O4 sourcefile.C This option enables additional optimizations at each level.
These optimizations may do any of the following:
You can enable only conservative optimizations at the second, third, or fourth optimization levels by using the +Ofltacc=strict +Ofenvaccess option, as follows: aCC +O2 +Ofltacc=strict +Ofenvaccess sourcefile.C or: aCC +O3 +Ofltacc=strict +Ofenvaccess sourcefile.C or: aCC +O4 +Ofltacc=strict +Ofenvaccess sourcefile.C This option disables all but the most conservative optimizations at each level. Conservative optimizations do not change the behavior of code, in most cases, even if the code does not conform to standards. Use only conservative optimizations provided with level 2, 3, and 4 when your code is unstructured. You can remove optimization time restrictions at the second, third, or fourth optimization levels by using the +Onolimit option as follows: aCC +O2 +Onolimit sourcefile.C or: aCC +O3 +Onolimit sourcefile.C or: aCC +O4 +Onolimit sourcefile.C By default, the optimizer limits the amount of time spent optimizing large programs at levels 2, 3, and 4. Use this option if longer compile times are acceptable because you want additional optimizations to be performed. You can disable optimizations that expand code size at the second, third, and fourth optimization levels by using the +Osize suboption, as follows: aCC +O2 +Osize sourcefile.C or: aCC +O3 +Osize sourcefile.C or: aCC +O4 +Osize sourcefile.C Most optimizations improve execution speed and decrease executable code size. A few optimizations significantly increase code size to gain execution speed. The +Osize option disables these code-expanding optimizations. Use this option if you have limited main memory, swap space, or disk space. Profile-based optimization (PBO) is a set of performance-improving code transformations based on the runtime characteristics of your application. When using profile-based optimization, please note the following:
There are three steps involved in performing profile-based optimization: To instrument your program, use the +Oprofile=collect option as follows: aCC +Oprofile=collect -O -c sample.C aCC +Oprofile=collect -O -o sample.exe sample.o The first command line uses the -O option to perform level 2 optimization and the +Oprofile=collect option to prepare the code for instrumentation. (+Oprofile=collect generates intermediate code.) The -c option in the first command line suppresses linking and creates an intermediate object file called sample.o. The .o file can be used later in the optimization phase, avoiding a second compile. The second command line uses the -o option to link sample.o into sample.exe. The +Oprofile=collect option instruments sample.exe with data collection code.
When optimizing at level 4, (where code generation is delayed until link time), use the +Oprofile=collect option as follows: aCC +Oprofile=collect +O4 -c x.C y.C aCC +Oprofile=collect +O4 x.o y.o The first line creates an intermediate file for instrumentation. The second line creates optimized code with instrumentation. To collect execution profile statistics, run your instrumented program with representative data as follows: sample.exe < input.file1 sample.exe < input.file2 This step creates and logs the profile statistics to a file, by default called flow.data. The data collection file is a structured file that may be used to store the statistics from multiple test runs of different programs that you may have instrumented. Profile-based optimization stores execution profile data in a disk file. By default, this file is called flow.data and is located in your current working directory. You can override the default name of the profile data file. This is useful when working on large programs or on projects with many different program files. The FLOW_DATA environment variable can be used to specify the name of the profile data file with either the +Oprofile=collect or +Oprofile=use options. In the following example, the FLOW_DATA environment variable is used to override the flow.data file name. The profile data is stored instead in /users/profiles/prog.data.
In this example, the +Oprofile=use:filename option is used to override the flow.data file name with the name /users/profiles/prog.data.
To optimize the program based on the previously collected runtime profile statistics, relink the program as follows: aCC -o sample.exe +Oprofile=use sample.o When optimizing at level 4, (where code generation is delayed until link time), use the +Oprofile=use option as follows: aCC +Oprofile=use +O4 x.o y.o When +Oprofile=use is used, no recompilation is necessary. The .o file saved from the instrumentation phase can be used as input. For more information on profile-based optimization, you can refer to the HP-UX Online Linker and Libraries User’s Guide. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||