Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP Capacity Advisor Version 4.1 User's Guide > Chapter 3 Key Capacity Advisor Concepts

Determining Trends in Capacity Advisor

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

Determining trends from collected utilization data can be a challenging task. Accurate trend analysis requires adequate historical data and an understanding of the cyclic nature of the data being analyzed as well as any special events that might be found in the historical data.

  • Trends are frequently small values, on the order of percents or fractions of a percent per month.

  • The cyclic data can easily be orders of magnitude greater than the trend (heavy calculations the day before payroll distribution, floods of users logging on after work on the East coast, and so on).

  • Special events can also be orders of magnitude greater than the trend (seasonal promotions, once per year calculations such as taxes).

Any algorithmic analysis must be able to deal with these problems. HP Capacity Advisor combines aggregation of points based on known business cycles to deal with cyclic patterns with exclusion of points to deal with special events, to provide data for a linear regression.

Aggregation of Points in Business Interval Bins

To reduce the impact of cyclic changes in the historical data, a user-specified business period is used to break the data into time-interval based “bins” and each bin is then represented by a single point. The point can be the average, the peak, or the 90th percentile of the data (90% of the points are less than the value). A bin will not be used unless the percent of points within the bin that are valid exceeds the threshold you have specified.

IMPORTANT: A trend will not be calculated unless at least two bins with an adequate percentage of valid points exist within the range of data being analyzed.

Choosing an Appropriate Business Interval

It is crucial to have a significant amount of data for analysis. Choosing an appropriate business interval with a data collection period that is long enough helps to ensure that you have enough data for a useful analysis. For example, a business interval of 1 week and data collection period of 1 month provides only four aggregate data points. This is insufficient to provide meaningful results.

To improve results, for this example, use a business interval of 1 day with a data collection of 1 month to provide 30 data points, or use a business interval of 1 week with a data collection of 6 months to provide 26 data points. Modifying the business interval and/or the data collection period gives you more flexibility in arriving at a significant amount of data for analysis.

Exclusion of Points

You can set the report period to exclude a special event or mark the time period invalid to exclude points collected during that period from a trend analysis.

Factors That Affect Data Validity

Within any data collection period, events can occur in the polled systems that affect the quality of data available during that time period. Capacity Advisor identifies data points that could adversely affect the quality and validity of report results.

The following are examples of events that Capacity Advisor can recognize (and disregard) as potential sources of invalid points:

  • System downtime during the collection period.

  • Out of the ordinary activity designated by you. You can manually designate time periods as invalid when you know resource usage has been outside the norm that you want to consider in your capacity planning. (See The Graph section for hints on how to do this.)

  • Partial collection from a virtual machine or a VM host. When Capacity Advisor is unable to apply a correction that accounts for all activity on a VM host, it marks any partial data collection as invalid.

How this relates to setting a Validity Threshold

The Validity Threshold that you set should reflect your tolerance for obtaining a sufficient amount of valid data in the collection period that you designate. If the reports that you run show that the given threshold is not obtainable for the designated time period, this may indicate that many of the data points in the designated collection period are invalid.

In this case, you can choose a lower Validity Threshold with the understanding that the report outcome may be a less reliable indicator of probable resource usage, or you can select a different or longer data collection period to improve the likelihood of obtaining a sufficient percentage of valid points for a good report.

Linear Regression

linear regression

The linear regression is based on a least squares fit that minimizes the sum of the squares of the vertical offsets between each of the aggregate points and the trend line that describes them.

TIP: Regressions performed over small data sets are not always meaningful and can be misleading. Any trend analysis based on less than a dozen aggregate points should be carefully compared with the historical data to see if it "makes sense." The maximum number of data points for the trend analysis is the total time for the report divided by the business interval, because business intervals can be excluded if they do not meet the validity criteria.

Because the trend is reported as an annual growth rate, it is best to have more than a year of historical data before trying to analyze trends.

Error Analysis

You can choose to include error analysis in the report. The following error value is available:

r-squared:  r2 is the square of the correlation coefficient (r), and is used in the 'goodness of fit' analysis of trend estimations. r is a value between 0 and +/- 1. where values approaching +/- 1 indicate increasing validity of the data representation.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2006-2009 Hewlett-Packard Development Company, L.P.