These release notes cover the June 2000 (IPR 0006) release of Support Plus for HP-UX 11.00/10.20 running on S800/S700 systems.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
NOTE: As of the September 1999 release, the name of the Diagnostic/IPR Media has been changed to Support Plus. In addition, the format has changed so that there is a separate CD-ROM for each version of the operating system (HP-UX 10.20 and HP-UX 11.0).
Included on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 10.20 or 11.X (IPR 9902 and later).
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm Documentation
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.
Changes in the EMS Hardware Monitors for the the June 2000 (IPR 0006) release include:
- Enhanced platform to support multiple-view (Predictive-enabled) EMS hardware monitors. The change will be transparent to most customers.
The immediate purpose of this change is to enable Predictive Support to work with hardware monitors. There will also be long-term benefits to all customers, as well.
Following is a brief explanation. For more details, refer to the "EMS Hardware Monitors User's Guide", available at our Web site.
When a hardware monitor detects an event, it can send an event message to one or more targets ("clients").
target 1 / / event ---> monitor ---> event message -----> target 2 \ \ target 3Previously, EMS hardware monitors generated events in the same way for all targets. The problem is that different targets, such as Predictive Support, may have different requirements for events.As of the June 2000 release (IPR 0006), certain monitors will allow event reporting to be tailored for different targets. This "Multiple-View" ("Predictive-enabled") feature will be added to all hardware monitors in future releases.
With Multiple-View hardware monitors, you can create a different Client Configuration File (*.clcfg) for each target. In this file, you can specify:
- The text to be included in event messages.
- "Qualification requirements": the time or value thresholds a problem must meet in order to generate an event. For example, the default time threshold might be to send an event if the problem is seen 6 times in 24 hours. However, Predictive may want to see the event 3 times in 24 hours. Another example: the default value threshold might be to send the event when the value associated with the problem is >= 80, but Predictive may want to see the event when the value is >= 70.
- Events to be enabled/disabled for a given target. For example event 1 may be enabled for target #1, but disabled for target #2.
- Severity level for an event sent to a given target. For example, event 3 may have a severity level of CRITICAL for target #1, but a severity level of MAJOR_WARNING for target #2.
The default Client Configuration File (*.clcfg) is:
/var/stm/config/tools/monitor/default_MONITOR_NAME.clcfgFor example:/var/stm/config/tools/monitor/default_disk_em.clcfgThe Client Configuration File for Predictive Support client would be :
/var/stm/config/tools/monitor/predictive_MONITOR_NAME.clcfgFor example,/var/stm/config/tools/monitor/predictive_disk_em.clcfg- Added a standalone program to cause multiple-view (Predictive-enabled) EMS hardware monitors to generate a test event:
/opt/resmon/bin/send_test_event OR /etc/opt/resmon/lbin/send_test_eventThe program was created for Predictive Support to ensure that the communication mechanism from the monitor to Predictive is working. However, it can be used by customers to ensure the same thing: that the communication mechanisms from the monitor to their notification method (e-mail, event log, SNMP trap, etc) are working.The program will not work with monitors that have not been updated to be multiple-view. In the long term, all monitors are planned to be updated to be multiple-view.
Before the send_test_event program can be run, the monitors must be enabled and configured. (That is, when you run monconfig, it should say that monitoring is enabled and when you do a "Check", the requests show up.)
For more information on the command, see the man page for send_test_event.
- Changed the numbering of SCSI events, and the content of the event messages. These events may be reported by any hardware monitor (they are generated by the Default SCSI Decode/Monitor Library). They are listed at:
http://docs.hp.com/hpux/onlinedocs/diag/ems/scsi.htm- Added functionality to moncheck, which is the program run when the user selects the C)heck monitor requests command from monconfig:
- Added support for monitors with multiple resource class levels.
- Enhanced to display each resource instance for a monitor, plus the status of the request to get the current monitoring requests, including a message reporting if there are no active monitoring requests for that instance. In addition, the monitor class or instance along with error information is displayed if there were problems retrieving the list of lower level classes or instances.
- Enchancements to the SCSI Tape Devices Monitor (dm_stape):
- Converted the monitor to be Multiple View (Predictive Enabled).
- Re-mapped events 101-164 to 201-264 to avoid conflicts with standard events.
- Added support for HPC7145-8000 autochanger.
- Converted the LPMC Monitor (lpmc_em) to be Multiple View (Predictive Enabled).
- Converted the Core Hardware Monitor (dm_core_hw) to be Multiple View (Predictive Enabled).
- Enhanced Disk Monitor (disk_em) to reduce the processing time required to filter out the hardware paths for XP256 disk arrays. Also, changed the monitor so it does not log an entry saying "NOT IMPLEMENTED" for SCSI command 0x4d.
- Fixed a problem with the SCSI Card Monitor (scsi123_em), whereby the monitor did not work with the Peripheral Status Monitor (PSM).
- Fixed a problem with the System Status Monitor (sysstat_em) whereby EMS would time out when a sub-class request of an instance is required. Also fixed a potential problem, whereby the monitor could run out of file descriptors when too many systems are being monitored.
- Enhanced memory logging daemon (memlogd) to support single-bit error (SBI) trending and reporting. This enhancement allows the memory monitor to generate single-bit error events and to perform trending. Memlogd was modified to have the ability to report all single-bit errors and pages deallocated.
- Added more messages that memory monitor (dm_memory) can report.
- Added support for new device types for monitors when generating I/O events. For headers which contain: "Device_type at hardware path X.X.X: description". The New Device_type strings are:
Disk Array
Tape Library
Fibre Channel Hub
Fibre Channel Switch
Fibre Channel Scsi Bridge- Fixed a problem with the Fast/Wide SCSI Disk Array monitor (fw_disk_array), whereby the monitor never processed any errors from C2430D SCSI drives (codename "Cascade"). Instead, the Disk Monitor (disk_em) would try to process these errors and would report that it was unable to do so. With this fix, the Fast/Wide SCSI Disk Array monitor will correctly process errors from C2430D SCSI drives. (JAGad00027, JAGad00301).
- Fixed a problem with the Disk Monitor (disk_em), whereby when the monitor does log-sense request w/ CDB (legal but not supported by disk), a dmesg is generated indicating an Illegal Request. After this fix, the dmesg is no longer generated.
- Fixed a problem with the Disk Monitor (disk_em), whereby incorrect events would generated for LLIO errors with sense data. The code assumed that LLIO data would not have sense data.
- Enhancements to AutoRAID Disk Array Monitor (armmon):
- Fixed a problem creating a network socket to the ARMServer.
- Fixed a problem that occurred rarely during armmon startup, resulting in the following event being posted to the event log: "The computer cannot connect to the ARMServer" The monitor would run, but with continued difficulty connecting to the ARMServer.
- Modified the format of the event messages to be consistent with other monitors.
- Fixed a problem with the Disk Array FC60 Hardware Monitor (fc60mon). Previously, if fc60mon was installed on a system that hadn't had any devices attached, it would cause the startcfg_client daemon to run continuously. The user would see the system resources tied up, and the system would lose performance.
- On the March 2000 release, psmctd would use up some CPU time constantly updating the /var/stm/data/psm_data file when it wasn't necessary. This fix covers the corner case in the code to only do the update when necessary. (The psmctd daemon is started by STM. It communicates with all the EMS Hardware Monitors to get events, converts these events into UP/DOWN state, and updates a file.)
- Fixed problem that can occur if hardware is removed from the computer system and the system is rebooted; Restart Messages were generated from EMS indicating that the hardware was still being monitored. (The psmctd daemon was not removing monitor requests at shutdown.)
- Fixed mapping of and display of default SCSI status by SCSI-related monitors, so that the logic of the Sense Key Specific Data Valid bit in the SCSI Sense data displays the correct text, TRUE or FALSE. Previously, FALSE was displayed if the bit was set and TRUE was displayed if the bit was clear.
- Added support for Fabric (new Fibre Channel topology) to the A5158A Fibre Channel Adapter monitor, dm_TL_adapter.
- The resource names for the System Status monitor (sysstat_em ) changed from:
Event monitoring: /system/system_status/events/<target_system>
Status monitoring: /system/system_status/status/<target_system>
to:
Event monitoring: /system/events/system_status/<target_system>
Status monitoring: /system/status/system_status/<target_system>
CAUTION: Monitoring Changes for disc30, sdisk and disk array devicesAs of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec, pharaymon_exec, and psdisk_exec) handled driver error entries for these devices.
As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the diaglogd exec programs (disk30_exec.cfg, sdisk_exec.cfg, and haraymon_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.
CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:
MC/ServiceGuard A.10.10, A.11.01, A.11.03 ServiceGuard OPS Edition A.11.02, A.11.03Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.
Monitors are provided to support the following:
In addition, a Hardware status monitor is provided to monitor the current status of the products supported by the above list.
- AutoRAID Disk Array Monitor
- High-Availability Disk Array Monitor
- Disk Monitor
- SCSI Tape Devices Monitor
- High-Availability Storage System Monitor
- Fast-Wide SCSI Disk Array Monitor
- Fibre Channel SCSI Multiplexer Monitor
- Fibre Channel Adapters Monitor
- Fibre Channel Adapter (Model A5158) Monitor
- Fibre Channel Arbitrated Loop Hub Monitor
- Fibre Channel Switch Monitor
- Memory Monitor
- Core Hardware Monitor
- LPMC Monitor
- Kernel Resource Monitor
- Disk Array FC60 Monitor
- SCSI Card Monitor
- System Status Monitor
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/.
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag/ . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files