EMS Hardware Monitors (logo)

Release Notes for EMS Hardware Monitors (September 2000)

These release notes cover the September 2000 (IPR 0009) release of Support Plus for HP-UX 11.00/10.20 running on S800/S700 systems.

NOTE: As of the September 1999 release, the name of the Diagnostic/IPR Media has been changed to Support Plus. In addition, the format has changed so that there is a separate CD-ROM for each version of the operating system (HP-UX 10.20 and HP-UX 11.0).

Overview

Included on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 10.20 or 11.X (IPR 9902 and later).

Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.

Configuring Hardware Monitoring

The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.

By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:

All events will be stored in /var/opt/resmon/log/event.log.

To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .

The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm Documentation

For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics":

          http://docs.hp.com/hpux/diag/
At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.

For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.

Changes

Changes in the EMS Hardware Monitors for the the September 2000 (IPR 0009) release include:

Customer-Visible Interface Changes

This section reports changes to the customer-visible interface in this release. This information is provided for the benefit of customers using scripts to drive hardware support tools to look at the output of hardware support tools.


CHANGE: In the IPR 0009 and HP-UX 11i release, the header displayed when "monconfig" is executed has been changed to include the STM and EMS version numbers.

BEFORE:

============================================================================
===================       Event Monitoring Service       ===================
===================      Monitoring Request Manager      ===================
============================================================================

  EVENT MONITORING IS CURRENTLY ENABLED.
AFTER:
============================================================================
===================       Event Monitoring Service       ===================
===================      Monitoring Request Manager      ===================
============================================================================

  EVENT MONITORING IS CURRENTLY ENABLED.
  EMS Version : A.03.10
  STM Version : A.22.10

CHANGE: Added more possible exit error messages for psmmon. These messages may be logged into the /etc/opt/resmon/log/api.log file when the monitor exits abnormally:

------------------------Start Event--------------------------------
User event occurred at Tue May 23 14:14:38.789544 2000
Process ID: 1246 (/usr/sbin/stm/uut/bin/tools/.../psmmon)   Log Level: Error
/usr/sbin/stm/uut/bin/tools/monitor/psmmon: Exiting due to receipt of signal
11.
------------------------Start Event--------------------------------

------------------------Start Event--------------------------------
User event occurred at Tue May 23 14:14:38.789544 2000
Process ID: 1246 (/usr/sbin/stm/uut/bin/tools/.../psmmon)   Log Level: Error
/usr/sbin/stm/uut/bin/tools/monitor/psmmon: Exiting due to SIGINT signal.
------------------------Start Event--------------------------------

------------------------Start Event--------------------------------
User event occurred at Tue May 23 14:14:38.789544 2000
Process ID: 1246 (/usr/sbin/stm/uut/bin/tools/.../psmmon)   Log Level: Error
/usr/sbin/stm/uut/bin/tools/monitor/psmmon: Exiting due to error with exit
value 0xXX.
------------------------Start Event--------------------------------

------------------------Start Event--------------------------------
User event occurred at Tue May 23 14:14:38.789544 2000
Process ID: 1246 (/usr/sbin/stm/uut/bin/tools/.../psmmon)   Log Level: Info
/usr/sbin/stm/uut/bin/tools/monitor/psmmon: Exiting normally.
------------------------Start Event--------------------------------


In the IPR 0009 and HP-UX 11i release, about 17 changes were made to text under the "Description/Cause Action" and "Details" headings in the Default SCSI events generated and decoded by SCSI Device monitors/decoders. These events may be reported by any hardware monitor for SCSI devices.

-----------
For Event #100837, 100937, 101826, and 101726, the "Details" text did not display the Additional Sense Code and Additional Sense Qualifier description text. The following text was added:

100837
The combination of Additional Sense Code and Sense Qualifier (0x110b) indicates: Unrecovered read error. Recommend reassignment.

100937
The combination of Additional Sense Code and Sense Qualifier (0x110c) indicates: Unrecovered read error. Recommend rewrite.

101726
The combination of Additional Sense Code and Sense Qualifier (0x1805) indicates: Recovered data. Recommend reassign.

101826
The combination of Additional Sense Code and Sense Qualifier (0x1806) indicates: Recovered data. Recommend rewrite.

---------
Event #100068, Detail text decoding of Additional Sense Code and Additional Sense Qualifier was incorrect. It indicated "Ram Failure". Correct decoding is:
The combination of Additional Sense Code and Sense Qualifier (0x4000) indicates: Power-on or self-test failure for FRU indicated by sense code qualifier.

NOTE: valid values for the Additional Sense Code and Sense Qualifier for event #100068 range from 0x4000 to 0x400FF, where 0x40 is the Additional Sense Code.

-------------
Event #100837, 100937, 100208, 101126, 101026, 100271, 100872, the Description of the Error changed:

100837
The device was unsuccessful in reading the data for the current I/O request. Reassignment to a spare area on the medium is recommended.

100937
The device was unsuccessful in reading the data for the current I/O request. Rewriting the data is recommended.

100208
The medium in the device is incompatible with the device.

101126
The device was unsuccessful in its first attempt at reading the data requested in an I/O request, but was able to recover it. The requested data was successfully returned. Rewriting the data is recommended.

101026
The device was unsuccessful in its first attempt at reading the data requested in an I/O request, but was able to recover it. The requested data was successfully returned. Reassignment to a spare area on the medium is recommended.

100271
The device aborted the command. The initiator may be able to recover by retrying the command.

100872
The device aborted the command. The initiator may be able to recover by retrying the command.

-----------
Event 100208, 101126, 101826, 100999 (formerly 100299), the Cause/Action text was changed:

100208
Replace the medium with one that is compatible with the device.

101126
Rewrite of the data on the medium is recommended.

101826
Rewrite of the data on the medium is recommended.

100999
The error most likely indicates that the device is not fully supported by the current driver. This may or may not cause a problem in the operation of the device.

------------
Event #100999 replaces Event #100299.


Known Problems


CAUTION: Monitoring Changes for disc30, sdisk and disk array devices

As of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.

Formerly, the "diaglogd exec" programs (pdisc30_exec, pharaymon_exec, and psdisk_exec) handled driver error entries for these devices.

As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.

If you had customized the configuration files for the diaglogd exec programs (disk30_exec.cfg, sdisk_exec.cfg, and haraymon_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.


CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)

If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:

MC/ServiceGuard            A.10.10, A.11.01, A.11.03
ServiceGuard OPS Edition   A.11.02, A.11.03
Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.

Monitors Provided

Monitors are provided to support the following:

In addition, a Hardware status monitor is provided to monitor the current status of the products supported by the above list.

For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .

Monitor Dependencies

Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:

For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.

Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".

Defect Reporting

Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.

SD Product Structure

The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.

Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.

For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.

SD Bundle: OnlineDiag
   Description: On-line Diagnostic System (Series 800/700)

   SD PRODUCT: Sup-Tool-Mgr
      Description: Support Tools Manager for HP-UX Systems

      SD SUB-PRODUCT: Manuals
      Description: Support Tools Manager Manual Pages

         FILESET: RELEASE_NOTES
            Description: HPUX STM Release Notes

         FILESET: STM-MAN
            Description: HPUX STM Manual Pages

      SD SUB-PRODUCT: Runtime
      Description: STM Manual Runtime

         FILESET: STM-CATALOGS
            Description: HPUX STM Shared Libraries

         FILESET: STM-SHLIBS
            Description: HPUX STM Shared Libraries

         FILESET: STM-UI-RUN 
            Description: HPUX STM User Interface
         
         FILESET: STM-UUT-RUN
            Description: HPUX STM Unit Under Test Runtime 

   SD PRODUCT: EMS-Config
      Description: EMS Config

         FILESET: EMS-GUI
            Description: Event Monitoring Service Graphical 
                         User Interface

   SD PRODUCT: EMS-Core
      Description: EMS Core Product

         FILESET: EMS-CORE
            Description: Event Monitoring Service Core Files


Top of Page

/ Diagnostics HOME


URL: http://docs.hp.com/hpux/onlinedocs/diag/ems/emr_0009.htm
Last updated: Thu May 17 16:50:20 PDT 2001