These release notes cover the March 2000 (IPR 0003) release of Support Plus for HP-UX 11.00/10.20 running on S800/S700 systems.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
NOTE: As of the September 1999 release, the name of the Diagnostic/IPR Media has been changed to Support Plus. In addition, the format has changed so that there is a separate CD-ROM for each version of the operating system (HP-UX 10.20 and HP-UX 11.0).
Included on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 10.20 or 11.X (IPR 9902 and later).
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm Documentation
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.
Changes in the EMS Hardware Monitors for the the March 2000 (IPR 0003) release include:
- New monitor: Tachyon_TL Fibre Channel Monitor (dm_TL_adapter). This monitor supports the A5158A-A Tachyon-TL Fibre Channel Mass Storage Adapter. The monitor will decode error message from the driver subsystem of the adapter and generate events if applicable.
- Fixes to Fibre Channel SCSI Multiplexer Monitor (dm_fc_scsi_mux). Formerly the monitor did not pass information to Predictive Support; now it does. Also, a minor problem was fixed, whereby the monitor would periodically terminate itself on large and busy systems and then be automatically re-started.
- Enhanced monconfig (the program used to configure EMS HW Monitors) so that the TCP and UDP port numbers can be up to 65535. In previously releases, they had to be less than 10000.
- Added support to SCSI Tape Devices Monitor (dm_stape) for the following devices: HP C5683A (DDS-4) tape drive , A5617A (model L180) tape autoloader, and HP C6280-8000 (Model 818) tape autoloader.
- Fixed a problem whereby monitors could use an increasing amount of memory when events are generated.
- Fixed a problem that only occurred on the Dec 1999 release (IPR 9912). On a reboot or when enabling monitoring or when new hardware is added to the system, if one of the monitors is not responding (like the HUB or Switch monitor without the C++ patch), it could take up to 2 hours for the psmctd daemon to complete processing. What this means is that if the customer goes into the EMS GUI during this time and tries to list the "status" resources, they will get a timeout or an error indicating there are no instances. The workaround for this problem is to go into the /var/stm/config/sys/psmctd.cfg file, uncomment the line with MAX_RETRIES and change it to 10 instead of 120.
- Fixed the diaglogd daemon to make it more resilient. Formerly, during system shutdowns, when the diaglogd daemon was cleaning up and exiting, there was a small chance that it would hang and need to be shut down manually (kill -9). Also, formerly there was a small chance that some events would be recorded in the Raw Log, but not sent to individual monitors.
- Modified dm_core_hw monitor event messages slightly to more accurately reflect the problems. Change messages to say "repair or replace." Change A-Class, L-Class, and future systems to say "Platform monitor" instead of "power monitor." A
- Fixed the severity and description on several default SCSI events:
SCSI Event Number Generated by SCSI Status Byte --------------- ----------------------------- 100076 (changed to 100100) BUSY (0x08) 100096 INTERMEDIATE-CONDITION MET (0x14) 100097 RESERVATION CONFLICT (0x18) 100099 QUEUE FULL (0x28)These events are generated when the OS thinks it detects a SCSI hardware error and logs a SCSI Status byte signifying a hardware error to the Raw Log. The relevant monitors then interpret these as one of the listed SCSI events. Previously, these were all considered CRITICAL events caused by software configuration problems between the driver and the device.Event 10076 has many variations and can be generated by many different conditions. Previously, one way that it could be generated was by a SCSI Status byte of BUSY (0x08). Now a SCSI Status byte of BUSY (0x08) will generate a new event, 100100. Other conditions will continue to generate event 100076.
Event 100096, generated by a SCSI Status byte of INTERMEDIATE-CONDITION MET (0x14), has been changed to be a successful I/O with INFORMATIONAL severity and new Summary, Description and Cause/Action text as indicated below:
Summary: Disk at hardware path 56/52.4.0 : Successful completion of operation Description of Error: An I/O request for the device was completed successfully. This I/O request was part of a linked sequence of requests. Probable Cause / Recommended Action: No action is necessary.Event 100097, and 100099, and 100100 have been changed to be recoverable errors with MAJOR WARNING severity. They have new Summary and Cause/Action text as indicated below. (Event 100097 has slightly different text for its Cause; see below).
Summary: Disk at hardware path 56/52.4.0 : Recoverable error Probable Cause / Recommended Action: The device was unable to process the request at this time. The request should be retried by the system. No further action is necessary. However, if the condition persists, there may be hardware problems with the device or the connectivity to the device. Alternately, the device may be in the process of recovering from a powerfail cycle. Alternately, there may be a problem in the configuration of the application software which is using the device causing access conflicts or bottlenecks.Event 100097 now has the following Cause text :An attempt was made to access a device which was reserved by another initiator.
CAUTION: Monitoring Changes for disc30, sdisk and disk array devicesAs of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec, pharaymon_exec, and psdisk_exec) handled driver error entries for these devices.
As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the dialogd exec programs (disk30_exec.cfg, sdisk_exec.cfg, and haraymon_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.
CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:
MC/ServiceGuard A.10.10, A.11.01, A.11.03 ServiceGuard OPS Edition A.11.02, A.11.03Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.
Monitors are provided to support the following:
In addition, a Hardware status monitor is provided to monitor the current status of the products supported by the above list.
- HP Disk Arrays
- Fibre Channel Interconnect
- Fibre Channel Interface Cards
- High Availability Storage System Enclosures
- SCSI Tape Products
- HP SCSI Disk Products
- HP Fibre Channel Disk Products
- HP Fibre Channel Switch
- Memory
- LPMCs
- Core Hardware
- Kernel Resources
- HP Fibre Channel High Availability Disk Array (Model 60/FC)
- SCSI1, SCSI2, and SCSI3 Interface Cards.
- System Status
- A5158A-A Tachyon-TL Fibre Channel Mass Storage Adapter
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/.
Several of the monitors have special requirements, such as patches or certain versions of firmware. Current requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag/. Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Note: The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6).
Note: a patch is required if your system includes a HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
For HP-UX 11.0 (S800 only): PHCO_19571: s700_800 11.00 HP Array Manager/60 cumulative patch For HP-UX 10.20 (S800 only): PHCO_19485: s700_800 10.20 HP Array Manager/60 installation patchDefect ReportingUse CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files