| United States-English |
|
|
|
![]() |
HP Itanium-based Systems: HP aC++/HP C Programmer's Guide > Chapter 2 Command-Line OptionsCode Optimizing Options |
|
Optimization options can be used to improve the execution speed of programs compiled with the HP compiler. To use optimization, first specify the appropriate basic optimization level (+O1, +O2, +O3, or +O4) on the command line followed by one or more finer or more precise options when necessary. For more information and examples, refer to Chapter 7 “Optimizing HP aC++ Programs”. This section discusses the following topics: The following options allow you to specify the basic level of optimization. Compiling files at optimization level 2 ("-O" or "+O2") and above increases the amount of virtual memory needed by the compiler. In cases where very large functions or files are compiled at +O2, or in cases where aggressive (+O3 and above) optimization is used, ensure that the maxdsiz kernel tunable is set appropriately on the machine where compilation takes place. HP recommends a setting of 0x80000000, or 2 Gb (the default for this parameter is 0x40000000, or 1 Gb) for maxdsiz in such cases. Updating the maxdsiz tunable will ensure that the compiler does not run out of virtual memory when compiling large files or functions. Note that the maxdsiz_64bit setting should be set at least as large as the maxdsiz setting. In addition, maxssiz should be set to 128 MB for very large or complex input files. (Normally a maxssiz setting of 64 MB is sufficient.) HP recommends not reducing the maxfiles setting below the default value of 2048. See the kctune man page for more information on how to change kernel tunable parameters. -O The -O option invokes the optimizer to perform level 2 optimization. This option is equivalent to +O2 option. Example: This command compiles prog.C and optimizes at level 2: aCC -O prog.C +O0 Use +O0 for fastest compile time or with simple programs. No optimizations are performed. Example: This command compiles prog.C and optimizes at level 0: aCC +O0 prog.C +O1 The +O1 option performs level 1 optimization only. This includes branch optimization, dead code elimination, faster register allocation, instruction scheduling, and peephole optimization. This is the default optimization level. Example: This command compiles prog.C and optimizes at level 1: aCC +O1 prog.C +O2 The +O2 option performs level 2 optimization. This includes level 1 optimizations plus optimizations performed over entire functions in a single file.
Example: This command compiles prog.C and optimizes at level 2: aCC +O2 prog.C +O3 The +O3 option performs level 3 optimization. This includes level 2 optimizations plus full optimization across all subprograms within a single file.
Example: This command compiles prog.C and optimizes at level 3: aCC +O3 prog.C +O4 The +O4 option performs level 4 optimization. This includes level 3 optimizations plus full optimizations across the entire application program. In the absence of +Oprofile=use, the compiler will emit a warning and the optimization level will drop to +O3. Also the defaults which depend on optimization will be the defaults for +O3. When you link a program, the compiler brings all modules that were compiled at optimization level 4 into virtual memory at the same time. Depending on the size and number of the modules, compiling at +O4 can consume a large amount of virtual memory. If you are linking a large program that was compiled with the +O4 option, you may notice a system slow down. In the worst case, you may see an error indicating that you have run out of memory.
Example: This command compiles prog.C and optimizes at level 4: aCC +O4 prog.C If you run out of memory when compiling at +O4 optimization, there are several things you can do:
Object files generated by the compiler with +O4 or -ipo, called intermediate object files, are intended to be temporary files. These object files contain an intermediate representation of the user code in a format that is designed for advanced optimizations. The size of these intermediate object files can typically be 3 to 10 times as large as normal object files. Hewlett-Packard reserves the right to change the format of these files without prior notice. There is no guarantee that intermediate object files will be compatible from one revision of the compiler to the next. Use of intermediate files must be limited to the compiler that created them. For the same reason, intermediate object files should not be included into archived libraries that might be used by different versions of the compiler. The compiler will issue an error message and terminate when an incompatible intermediate file is generated. Following are the additional optimizations options for finer control: +ES[no]lit The +ES[no]lit option places [does not place] string literals and const-qualified variables that do not require load-time or runtime initialization in the read-only data section. This is same as using +Olit option. This option is deprecated and may not be supported in future releases. Instead you can use +Olit=all for +ESlit and +Olit=none for +ESnolit options. The -ipo option enables interprocedural optimizations across files. The object file produced using this option contains intermediate code (IELF file). At link time, ld automatically invokes the interprocedural optimizer (u2comp), if any of the input object files is an IELF file. For optimization levels +O0 and +O1, this option is silently ignored. The -ipo option will get implicitly invoked with the +O4 and +Ofaster options to match current behavior (+O4 ==> +O3 -ipo). This option is incompatible with debugging options. This restriction will be removed in the future. Object files generated by the compiler with +O4 or -ipo, called intermediate object files, are intended to be temporary files. These object files contain an intermediate representation of the user code in a format that is designed for advanced optimizations. The size of these intermediate object files can typically be 3 to 10 times as large as normal object files. Hewlett-Packard reserves the right to change the format of these files without prior notice. There is no guarantee that intermediate object files will be compatible from one revision of the compiler to the next. Use of intermediate files must be limited to the compiler that created them. For the same reason, intermediate object files should not be included into archived libraries that might be used by different versions of the compiler. The compiler will issue an error message and terminate when an incompatible intermediate file is generated. +[no]nrv -nrv_optimization,[off|on] The +[no]nrv option enables [disables] the named return value (NRV) optimization. By default it is disabled. The NRV optimization eliminates a copy-constructor call by allocating a local object of a function directly in the caller’s context if that object is always returned by the function. Example:
This optimization will not be performed if the copy-constructor was not declared by the programmer. Note that although this optimization is allowed by the ISO/ANSI C++ standard, it may have noticeable side effects. Example: aCC -Wc,-nrv_optimization,on app.C +O[no]clone Cloning is controlled by a list-free option +O[no]clone analogous to +O[no]inline. It is on by default with +O3 and +O4, and can be disabled. The +O[no]clone option influences cloning both in to and out of the functions it governs. Example: In the following examples, +Onoclone applies to the function foo, and directs that foo itself should not be cloned and that calls from foo (bar) should not be redirected to clones. $ cc -c +Oprofile=use +O4 foo.c +Onoclone $ cc -c +Oprofile=use +O4 bar.c +O[no]failsafe The +O[no]failsafe option enables [disables] failsafe optimization. When a compilation fails at the current optimization level +Ofailsafe will automatically restart the compilation at +O2 (for specific high level optimizer errors +O3/+O4) or +O0. The default is +Ofailsafe. +O[no]all Use the +Oall option to obtain the best possible performance. This option should be used with stable, well-structured code. These optimizations give you the fastest code, but are riskier than the default optimizations. You can use +Oall at optimization levels 2, 3, and 4. The default is +Onoall. This option is deprecated and may not be supported in future releases. Instead you can use +Ofaster. +O4 +Onolimit +Oaggressive is approximately equivalent to +Oall. +O[no]aggressive The +Oaggressive option enables aggressive optimizations. The +Onoaggressive option disables aggressive optimizations. By default, aggressive optimizations are turned off. The +Oaggressive option is approximately equivalent to +Osignedpointers +Olibcalls +Onoinitcheck +Ofltacc=relaxed.
+O[no]conservative The +O[no]conservative option is deprecated and may not be used in future releases. It is approximately equivalent to +Oparmsoverlap +Onomoveflops. The default is +Onoconservative. +O[no]limit The +Olimit option enables optimizations that significantly increase compile time or that consume a lot of memory. The +Onolimit option suppresses optimizations regardless of their effect on compile time or memory consumption. Use +Onolimit at all optimization levels. Usage: +O[no]limit=level The defined values of level are:
Example: To remove optimization time restrictions at the second, third, or fourth optimization levels, use +Onolimit as follows: aCC <opt level> +Onolimit sourcefile.C +O[no]ptrs_ansi +Optrs_ansi is synonymous to +Otype_safety=ansi. +Onoptrs_ansi is synonymous to +Otype_safety=off.
+O[no]ptrs_strongly_typed The default is +Onoptrs_strongly_typed. +Optrs_strongly_typed is synonymous to +Otype_safety=strong. +Onoptrs_strongly_typed is synonymous to +Otype_safety=off.
+O[no]ptrs_to_globals[=list] The +O[no]ptrs_to_globals option tells the optimizer whether global variables are accessed [are not accessed] through pointers. If +Onoptrs_to_globals is specified, it is assumed that statically-allocated data (including file-scoped globals, file-scoped statics, and function-scoped statics) will not be read or written through pointers. The default is +Onoptrs_to_globals. Advanced optimization options provide additional control for special situations. +O[no]cross_region_addressing The +O[no]cross_region_addressing option enables [disables] the use of cross-region addressing. Cross-region addressing is required if a pointer, such as an array base, points to a different region than the data being addressed due to an offset that results in a cross-over into another region. Standard conforming applications do not require the use of cross-region addressing. The default is +Onocross_region_addressing.
+O[no]datalayout The +O[no]datalayout option enables [disables] profile-driven layout of global and static data items to improve cache memory utilization. This option is currently enabled if +Oprofile=use (dynamic profile feedback) is specified. The default, in the absence of +Oprofile=use, is +Onodatalayout. +O[no]dataprefetch When +Odataprefetch is enabled, the optimizer inserts instructions within innermost loops to explicitly prefetch data from memory into the data cache. Data prefetch instructions are inserted only for data structures referenced within innermost loops using simple loop varying addresses (that is, in a simple arithmetic progression). Use this option for applications that have high data cache miss overhead. The default is +Onodataprefetch. +Odataprefetch is equivalent to +Odataprefetch=indirect. +Onodataprefetch is equivalent to +Odataprefetch=none. Usage: +Odataprefetch=kind The defined values for kind are:
+O[no]extern Use the +O[no]extern option at optimization levels 0, 1, 2, 3, or 4. The default is +Oextern with no name list. +Oextern is equivalent to -Bextern. +Onoextern is equivalent to -Bprotected.
+O[no]fltacc=level The +O[no]fltacc option disables [enables] floating-point optimizations that can result in numerical differences. Any option other than +Ofltacc=strict also generates Fused Multiply-Add (FMA) instructions. FMA instructions can improve performance of floating-point applications. If you specify neither +Ofltacc nor +Onofltacc, less optimization is performed than for +Onofltacc. If you specify neither option, the optimizer generates FMA instructions but does not perform any expression-reordering optimizations. Specifying +Ofltacc insures the same result as in unoptimized code (+O0). Usage: +Ofltacc=level The defined values for level are:
All options except +Ofltacc=strict option allow the compiler to make transformations which are algebraically correct, but which may slightly affect the result of computations due to the inherent imperfection of computer floating-point arithmetic. For many programs, the results obtained with these options are adequately similar to those obtained without the optimization. For applications in which round-off error has been carefully studied, and the order of computation carefully crafted to control error, these options may be unsatisfactory. To insure the same result as in unoptimized code, use +Ofltacc. Example: All the options, except +Ofltacc=strict, allow the compiler to replace a division by a multiplication using the reciprocal. For example, the following code:
is transformed as follows (note that x is invariant in the loop):
Since multiplication is considerably faster than division, the optimized program runs faster. +Ofrequently_called=function1[,function2...] The named functions are assumed to be frequently called. This option overrides any information in a profile database. +Ofrequently_called:filename The file indicated by filename contains a list of functions, separated by spaces or newlines. These functions are assumed to be frequently called. This option overrides any information in a profile database. +O[no]initcheck The initialization checking feature of the optimizer can be on or off: When on (+Oinitcheck), the optimizer issues warning messages when it discovers uninitialized variables. When off (+Onoinitcheck), the optimizer does not issue warning messages. Use +Oinitcheck at optimization level 2 or above. If this option is used together with +check=uninit, uninitialized variables will remain uninitialized so that an error will be reported at runtime and trigger a program abort if the variables are accessed. +O[no]inline The +Oinline option indicates that any function can be inlined by the optimizer. +Onoinline disables inlining of functions by the optimizer. This option does not affect functions inlined at the source code level. Use +Onoinline at optimization levels 2, 3 and4. The default is +Oinline at optimization levels 3 and 4. Usage: +O[no]inline=function1{,function2...] Enables [disables] optimizer inlining for the named functions. +O[no]inline:filename The file indicated by filename should contain a list of function names, separated by commas or newlines. Optimization is enabled [disabled] for the named functions. +Oinlinebudget=n The +Oinlinebudget option controls the compile time budget for the inliner. A lower number causes the inliner to consider fewer candidates for inlining, while a higher number leads it to consider more candidates. The inlining candidates are ordered in priority order based on the inliner’s heuristics, so this does not affect the most important candidates. The +Oinlinebudget option controls the aggressiveness of inlining according to the value you specify for n where n is an integer in the range 1 - 1000000 that specifies the level of aggressiveness, as follows:
The +Onolimit and +Osize options also affect inlining. Specifying the +Onolimit option has the same effect as specifying +Oinlinebudget=200. The +Osize option has the same effect as +Oinlinebudget=1.
Use this option at optimization level 2 or higher. The default is +Oinlinebudget=100. +Olit=kind The +Olit option places the data items that do not require load-time or runtime initialization in a read-only data section. +Olit=all is the default. The defined values for kind are:
+Ointeger_overflow=kind To provide the best runtime performance, the compiler makes assumptions that runtime integer arithmetic expressions that arise in certain contexts do not overflow (produce values that are too high or too low to represent) both expressions that are present in user code and expressions that the compiler constructs itself. Note that if an integer arithmetic overflow assumption is violated, runtime behavior is undefined. +Ointeger_overflow=moderate is the default for all optimization levels. Previously, +Ointeger_overflow=aggressive was the default at +O2 and above. This was changed to enable a wider class of applications to be compiled with optimization and run correctly. The defined values of kind are:
+Olevel=name1[,name2,...,nameN] The +Olevel option lowers optimization to the specified level for one or more named functions. level can be 0, 1, 2, 3, or 4. The name parameters are names of functions in the module being compiled. Use this option when one or more functions do not optimize well or properly. This option must be used with a basic +Olevel or -O option. Note that currently only the C++ mangled name of the function is allowed for name. This option works like the OPT_LEVEL pragma. The option overrides the pragma for the specified functions. As with the pragma, you can only lower the level of optimization; you cannot raise it above the level specified by a basic +Olevel or -O option. To avoid confusion, it is best to use either this option or the OPT_LEVEL pragma rather than both. You can use this option at optimization levels 1, 2, 3, and 4. The default is to optimize all functions at the level specified by the basic +Olevel or -O option. Examples:
+O[no]libcalls The +O[no]libcalls option is deprecated and may not be supported in future releases. On Itanium®-based systems, including a system header file will cause the functions declared therein to be eligible for libcalls transformations, regardless of the state of +O[no]libcalls. The default is +Onolibcalls. Use +O[no]libcalls at any optimization level. +O[no]loop_transform This option transforms [does not transform] eligible loops for improved cache and other performance. This option can be used at optimization levels 2, 3 and 4. The default is +Oloop_transform. +O[no]loop_unroll [=unroll_factor] The +O[no]loop_unroll option enables [disables] loop unrolling. This optimization can occur at optimization levels 2, 3, and 4. The default is +Oloop_unroll. The default is 4, that is, four copies of the loop body. The unroll_factor controls code expansion. +O[no]loop_unroll_jam The +O[no]loop_unroll_jam option enables [disables] loop unrolling and jamming. Loop unrolling and jamming increases register exploitation. The default is +Onoloop_unroll_jam at optimization levels 3 and 4 only. +O[no]moveflops The +Onomoveflops option is approximately equivalent to +Ofltacc=strict +Ofenvaccess. The default is +Omoveflops. This option is deprecated and may not be supported in future releases. +O[no]openmp The +Oopenmp option causes the OpenMP directives to be honored. This option is effective at any optimization level. Non OpenMP parallelization directives are ignored with warnings. +Onoopenmp requests that OpenMP directives be silently ignored. If neither +Oopenmp nor +Onoopenmp is specified, OpenMP directives will be ignored with warnings. The OpenMP specification is available at http://www.openmp.org/specs. OpenMP programs require the libomp and libcps runtime support libraries to be present on both the compilation and runtime systems. The compiler driver automatically includes them when linking. If you use +Oopenmp in an application, you must use -mt with any files that are not compiled with +Oopenmp. For additional information and restrictions, “-mt”. It is recommended that you use the -N option when linking OpenMP programs to avoid exhausting memory when running with large numbers of threads.
+opts filename The file indicated by filename contains a list of options that are processed as if they had been specified on the command line at the point of the +opts option. The options must be delimited by a blank character. You can add comments to the option file by using a "#" character in the first column of a line. The "#" causes the entire line to be ignored by the compiler. Example:
Where GNUOptions contains:
+O[no]parminit The +O[no]parminit option enables [disables] automatic initialization to non-NaT of unspecified function parameters at call sites. This is useful in preventing NaT values in parameter registers. The default is +Onoparminit. +O[no]parmsoverlap The +Onoparmsoverlap option optimizes with the assumption that on entry to a function each of that function’s pointer-typed formals points to memory that is accessed only through that formal or through copies of that formal made within the function. For example, that memory must not be accessed through a different formal, and that formal must not point to a global that is accessed by name within the function or any of its calls. Use +Onoparmsoverlap if C/C++ programs have been literally translated from FORTRAN programs. The default is +Oparmsoverlap. +O[no]procelim The +O[no]procelim option enables [disables] the elimination of dead procedure code and sometimes the unreferenced data. Use this option when linking an executable file, to remove functions not referenced by the application. You can also use this option when building a shared library to remove functions not exported and not referenced from within the shared library. This may be especially useful when functions have been inlined.
The default is +Onoprocelim at optimization levels 0 and 1; at levels 2, 3 and 4, the default is +Oprocelim. +O[no]promote_indirect_calls The +O[no]promote_indirect_calls option uses profile data from profile-based optimization and other information to determine the most likely target of indirect calls and promotes them to direct calls. Indirect calls occur with pointers to functions and virtual calls. In all cases the optimized code tests to make sure the direct call is being taken and if not, executes the indirect call. If +Oinline is in effect, the optimizer may also inline the promoted calls. +Opromote_indirect_calls is only effective with profile-based optimization.
This option can be used at optimization levels 3 and 4. At +O3, it is only effective if indirect calls from functions within a file are mostly to target functions within the same file. This is because +O3 optimizes only within a file, whereas +O4 optimizes across files. The default is +Opromote_indirect_calls at optimization level 3 and above. +Onopromote_indirect_calls will be the default at optimization level 2 and below. +Orarely_called=function1[,function2...] The +Orarely_called option overrides any information in a profile database. The named functions are assumed to be rarely called +Orarely_called:filename The file indicated by filename contains a list of functions, separated by spaces or newlines. These functions are assumed to be rarely called. This option overrides any information in a profile database. +O[no]recovery The +O[no]recovery option generates [does not generate] recovery code for control speculation. The default is +Orecovery. For code that writes to uncacheable memory that may not be properly identified as volatile, the +Orecovery option reduces the risk of incorrect behavior.
+O[no]signedpointers The +Osignedpointers option treats pointers in Boolean comparisons (for example, <, <=, >, >=) as signed quantities. Applications that allocate shared memory and that compare a pointer to shared memory with a pointer to private memory may run incorrectly if this optimization is enabled. The default is +Onosignedpointers.
+Oshortdata[=size] All objects of [size] bytes or smaller are placed in the short data area, and references to such data assume it resides in the short data area. Valid values of size are a decimal number between 8 and 4,194,304 (4MB). If no size is specified, all data is placed in the short data area. The default is +Oshortdata=8.
+O[no]store_ordering The +O[no]store_ordering option preserves [does not preserve] the original program order for stores to memory that is visible to multiple threads. This does not imply strong ordering. The default is +Onostore_ordering. +Otype_safety=kind The +Otype_safety option controls type-based aliasing assumptions. The defined values for kind are:
The default is +Otype_safety=off. +Ounroll_factor=n The +Ounroll_factor option applies the unroll factor to all loops in the current translation unit. You can apply an unroll factor which you think is best for the given loop or apply no unrolling factor to the loop. If this option is not specified, the compiler uses its own heuristics to determine the best unroll factor for the inner loop. A user specified unroll factor will override the default unroll factor applied by the compiler. Specifying n=1 will prevent the compiler from unrolling the loop. Specifying n=0 allows the compiler to use its own heuristics to apply the unroll factor.
+O[no]volatile The +Ovolatile option implies that memory references to global variables are volatile and cannot be removed during optimization. The +Onovolatile option implies that all globals are not of volatile class. This means that references to global variables can be removed during optimization. Use this option to control the volatile semantics for all global variables. Use +Ovolatile at all optimization levels. The default is +Onovolatile.
+O[no]whole_program_mode The +O[no]whole_program option enables the assertion that only those files that are compiled with this option directly reference any global variables and procedures that are defined in these files. In other words, this option asserts that there are no unseen accesses to the globals. When this assertion is in effect, the optimizer can hold global variables in registers longer and delete inlined or cloned global procedures. This option is in effect only at +O4 level of optimization. All files compiled with +Owhole_program_mode must also be compiled with +O4. If any of the files were compiled with +O4 but were not compiled with +Owhole_program_mode, the linker disables the assertion for all files in the program. The default is +Onowhole_program_mode which disables the assertion. Use this option to increase performance speed, but only when you are certain that only the files compiled with +Owhole_program_mode directly access any globals that are defined in these files. Profile-based optimization is a set of performance-improving code transformations based on the runtime characteristics of your application. +Oprofile=[use|collect] The +Oprofile option instructs the compiler to instrument the object code for collecting runtime profile data. The profiling information can then be used by the linker to perform profile-based optimization. When an application finishes execution, it will write profile data to the file flow.data or to the file/path in the environment variable FLOW_DATA (if set). +Oprofile=use[:filename] causes the compiler to look for a profile database file. If a filename is not specified, the compiler will look for a file named "flow.data" or the file/path specified in the FLOW_DATA environment variable. If a filename is specified, it overrides the FLOW_DATA environment variable. After compiling and linking with +Oprofile=collect, run the resultant program using representative input data to collect execution profile data. Profile data is stored in flow.data by default. The name is generated as flow.<suffix> if there is already a flow.data file present in the current directory. Finally, recompile with the +Oprofile=use option (passing it the appropriate filename if necessary) to perform profile-based optimization. Example: aCC +Oprofile=collect -O -o prog.pbo prog.C The above command compiles prog.C with optimization, prepares the object code for data collection, and creates the executable file prog.pbo. Running prog.pbo collects runtime information in the file flow.data in preparation for optimization with +Oprofile=use. +Oprofile=collect [:<qualifiers>] <qualifiers> are a comma-separated list of profile collection qualifiers. Supported profile collection qualifiers:
This option merely enables the application for collection of the various forms of profiling data. The environment variable PBO_DATA_TYPE controls the type of data collected at runtime. It may be set to one of the following values, which must be consistent with the +Oprofile=collect qualifiers used to create the application:
The +O[no]info option displays informational messages about the optimization process. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||