Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Parallel Programming Guide for HP-UX Systems: K-Class and V-Class Servers > Chapter 1 Introduction

Overview of HP optimizations

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

HP compilers perform a range of user-selectable optimizations. These new and standard optimizations, specified using compiler command-line options, are briefly introduced here. A more thorough discussion, including the features associated with each, is provided in Chapter 3 “Optimization levels”.

Basic scalar optimizations

Basic scalar optimizations improve performance at the basic block and program unit level.

A basic block is a sequence of statements that has a single entry point and a single exit. Branches do not exist within the body of a basic block. A program unit is a subroutine, function, or main program in Fortran or a function (including main) in C and C++. Program units are also generically referred to as procedures. Basic blocks are contained within program units. Optimizations at the program unit level span basic blocks.

To improve performance, basic optimizations perform the following activities:

  • Exploit the processor's functional units and registers

  • Reduce the number of times memory is accessed

  • Simplify expressions

  • Eliminate redundant operations

  • Replace variables with constants

  • Replace slow operations with faster equivalents

Advanced scalar optimizations

Advanced scalar optimizations are primarily intended to maximize data cache usage. This is referred to as data localization. Concentrating on loops, these optimizations strive to encache the data most frequently used by the loop and keep it encached so as to avoid costly memory accesses.

Advanced scalar optimizations include several loop transformations. Many of these optimizations either facilitate more efficient strip mining or are performed on strip-mined loops to optimize processor data cache usage. All of these optimizations are covered in Chapter 7 “Controlling optimization”.

Advanced scalar optimizations implicitly include all basic scalar optimizations.

Parallelization

HP compilers automatically locate and exploit loop-level parallelism in most programs. Using the techniques described in Chapter 9 “Parallel programming techniques”, you can help the compilers find even more parallelism in your programs.

Loops that have been data-localized are prime candidates for parallelization. Individual iterations of loops that contain strips of localizable data are parcelled out among several processors and run simultaneously. For example, the maximum number of processors that can be used is limited by the number of iterations of the loop and by processor availability.

While most parallelization is done on nested, data-localized loops, other code can also be parallelized. For example, through the use of manually inserted compiler directives, sections of code outside of loops can also be parallelized.

Parallelization optimizations implicitly include both basic and advanced scalar optimizations.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.