Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers > Chapter 3 Optimization levels

HP optimization levels and features

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

This section provides an overview of optimization features which can be through either the command-line optimization options or manual specification using pragmas or directives.

Five optimization levels are available for use with the HP compiler: +O0 (the default), +O1, +O2, +O3, and +O4. These options have identical names and perform identical optimizations, regardless of which compiler you are using. They can also be specified on the compiler command line in conjunction with other options you may want to use. HP compiler optimization levels are described in Table 3-2 “Optimization levels and features”.

Table 3-2 Optimization levels and features

Optimization Levels FeaturesBenefits
+O0 (the default)

Occurs at the machine-instruction level

Constant folding

Data alignment on natural boundaries

Partial evaluation of test conditions

Registers (simple allocation)

Compiles fastest.

+O1

includes all of +O0

Occurs at the block level

Branch optim ization

Dead code elimination

Instruction scheduler

Peephole optimizations

Registers (faster allocation)

Produces faster programs than +O0, and compiles faster than level +O2.

+O2 (-O)

includes all of +O0, +O1

Occurs at the routine level

Common subexpression elimination

Constant folding (advanced) and propagation

Loop-invariant code motion

Loop unrolling

Registers (global allocation)

Register reassociation

Software pipelining

Store/copy optimization

Strength reduction of induction variables and constants

Unused definition elimination

Can produce faster run-time code than +O1 if loops are used extensively.Run-times for loop-oriented floating-point intensive applications may be reduced up to 90 per cent.Operating system and interactive applications that use the optimized system libraries may achieve 30 per cent to 50 per cent additional improvement.

+O3

includes all of

+O0,+O1,+O2

Occurs at the file level

Cloning within a single source file

Data localization

Automatic and directive-specified loop parallelization

Directive-specified region parallelization

Directive-specified task parallelization

Can produce faster run-time code than +O2 on code that frequently calls small functions, or if loops are extensively used. Links faster than +O4.
 

Inlining within a single source file

Loop blocking

Loop distribution

Loop fusion

Loop interchange

Loop reordering - preventing

Loop unroll and jam

Parallelization

Paralleliz ation, preventing

Reductions

Test promotion

All of the directives and pragmas of the HP parallel programming model are available in the Fortran 90 and
C compilers.

prefer_parallel requests parallelization of the following loop

loop_parallel forces parallelization on the last loop

parallel, end_parallel parallelizes a single code region to run on multiple threads.

begin_tasks, next_task, end_tasks forces parallelization of following code section

 

+O4

includes all of +O0, +O1, +O2, +O3

Not available in Fortran 90

Occurs at the cross-module level and performed at link time

Cloning across multiple source files

Global/static variable optimizations

Inlining across multiple source files

Produces faster run-time code than when +O3 global variables are used or when procedure calls are inlined across modules.

 

Cumulative Options

The optimization options that control an optimization level are cumulative so that each option retains the optimizations of the previous option. For example, entering the following command line compiles the Fortran program foo.f with all +O2, +O1, and +O0 optimizations shown in Table 3-2 “Optimization levels and features”:

% f90 +O2 foo.f

In addition to these options, the +Oparallel option is available for use at +O3 and above; +Onoparallel is the default, When the +Oparallel option is specified, the compiler:

  • Looks for opportunities for parallel execution in loops

  • Honors the parallelism-related directives and pragmas of the HP parallel programming model .

The +Onoautopar (no automatic parallelization) option is available for use with +Oparallel at +O3 and above. +Oautopar is the default. +Onoautopar causes the compiler to parallelize only those loops that are immediately preceded by loop_parallel or prefer_parallel directives or pragmas. For more information, refer to Chapter 9 “Parallel programming techniques”.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.