These release notes cover the December 2001 release of Support Plus for HP-UX 11i/11.00/10.20 running on S800/S700 systems.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
NOTE: As of the September 1999 release, the name of the Diagnostic/IPR Media has been changed to Support Plus. In addition, the format has changed so that there is a separate CD-ROM for each version of the operating system (HP-UX 11i, 110.00 and 10.20).
Included on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 11i, 11.00, or 10.20 (IPR 9902 and later).
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.
Changes in the EMS Hardware Monitors for the the December 2001 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Vi sible Interface Changes
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- AutoRAID Disk Array (armmon).
N/A- Chassis Code Monitor (dm_chassis).
- JAGad77204
IPR0109 required a patch due to an improperly formatted configuration file. This version fixes that problem. (11.11 only)- JAGad91737
New chassis codes for SuperDome and Keystone are now supported by dm_chassis. Caribe and Matterhorn chassis codes are also supported by dm_chassis.- CMC Monitor (cmc_em).
N/A- Core Hardware Monitor (dm_core_hw)
- JAGad75598
A fix was made to the default client configuration file, default_dm_core_hw.clcfg, to include the OS revision in events generated by the dm_core_hw monitor.- JAGad82489
A change to address was made, so that a single occurrence of a "Support Processor Not Responding" returned from the PDC_PAT_EVENTS routine would not generate a corresponding event. This is because this status can occasionally be reported in error (i.e., when no real problem exists).- JAGad95891
A fix was made to correct one of the lines output in the details section of events 40, 41, 48, and 49 on N-Class and L-Class systems. The line incorrectly reported that all of the fans or power supplies were missing or had failed, except for the ones that actually were missing or had failed.- JAGad95897
A fix was made for an issue where the endian-ness of registers reported in the details of events 79, 80, 81, 82, and 83 was wrong.- Core Hardware for Itanium (ia64_corehw).
N/A- Disk Array FC60 Monitor (fc60mon).
- JAGad89592
FC60MON catalog used to refer to "SCSI_TEMPLATE." It has changed to "FC60MON."- Disk Monitor (disk_em).
N/A- Fibre Channel Adapters (dm_FCMS_adapter).
N/A- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- JAGad89725
Support was added for the 2 Gigabyte Fibre Channel Adapter.- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
N/A- Fibre Channel Switch (dm_fc_sw).
- FC Switch monitor was updated for Predictive Enablement.
- Modified message text to eliminate references to ioscan, when adding or removing device events. The FC Switch does not show in the I/O tree.
- High Availability Disk Array Monitor (ha_disk_array) .
N/A- High Availability Storage System (dm_ses_enclosure)
N/A- Kernel Resource Monitor (krmond)
N/A- LPMC Monitor (lpmc_em).
- JAGad81201
Fix was made to generate multiple instances of events 100 (devices added) and 101 (devices removed) from the list of devices to be monitored. Currently, only one instance of the events is generated when multiple devices are added to/removed from the system.- Memory Monitor (dm_memory).
- Modify any "replace hardware" in cause/action text in the message file of the memory monitor to "Contact your HP support representative...." Some of the event descriptions of the EMS memory hardware monitor has been changed:
*** Event 1300, changed the "Cause/Action" statement FROM:
"The Page Deallocation Table (PDT) is in danger of overflowing, it is strongly advisable to evaluate whether any memory replacement is warranted. Although the errors are being corrected, this condition indicates a potential problem."
TO:
"The Page Deallocation Table (PDT) is in danger of overflowing, it is strongly advisable to closely monitor the situation. Although the errors are being corrected, this condition indicates a potential problem. Contact your HP support representative to check the memory boards."
*** Event 4200, changed the "Cause/Action" statement FROM:
"Although the single bit errors are being corrected, it is strongly advisable to evaluate whether any memory replacement is warranted at this time. This condition indicates a potential problem."
TO:
"Although the single bit errors are being corrected, it is strongly advisable to closely monitor the situation. This condition indicates a potential problem. Contact your HP support representative to check the memory boards."
*** Event 4500, changed the "Cause/Action" statement FROM:
"Although the single bit errors are being corrected, it is strongly advisable to evaluate whether any memory replacement is warranted at this time. This condition indicates a potential problem."
TO:
"Although the single bit errors are being corrected, it is strongly advisable to closely monitor the situation. This condition indicates a potential problem. Contact your HP support representative to check the memory boards."
*** Event 6000, changed the "Problem Description" statement FROM:
"The monitor detected the file "memory.debug" on the system. The memory.debug file is created when the data written into memory has a single bit error. This is a critical problem since this indicates some hardware on the bus is generating single bit errors. Although the errors are being corrected, this condition indicates a potential problem."
TO:
"The monitor detected the file "memory.debug" on the system. The memory.debug file is created when the data written into memory has a single bit error. This is a critical problem since this indicates some hardware on the bus is generating single bit errors."
*** Event 6000, changed the "Cause/Action" statement FROM:
"It is strongly advisable to evaluate whether any hardware replacement is warranted at this time. Evaluate the file memory.debug located in the directory /var/stm/logs/os.
TO:
"Although the errors are being corrected, it is strongly advisable to evaluate the file memory.debug located in the directory /var/stm/logs/os. This condition indicates a potential problem. Contact your HP support representative to check the hardware."
*** Event 6100, changed the "Cause/Action" statement FROM:
"Although the single bit errors are being corrected, it is strongly advisable to evaluate whether any memory replacement is warranted at this time."
TO:
"Although the single bit errors are being corrected, it is strongly advisable to closely monitor the situation. Contact your HP support representative to check the memory boards."
- JAGad81201
Following the fix to JAGad81201 in the template monitor:
- prior to the fix, multiple events 100, 101 were not generated if > 1 devices were added/removed in the system;
- hence, the fix should have the monitor generate events 100, 101 for each device added/removed in the system.
NOTE: It looks like the fix to JAGad81201 does not affect the memory monitor, dm_memory, because dm_memory monitors the entire memory subsystem and is unlike the disk monitor (which can monitor multiple devices on the system so will need to monitor any added/removed devices in the system); however, to preserve consistency between the memory monitor and the template monitor, this fix is also performed in the memory monitor.
- Enhanced local port configuration functionality of STM to make ALL ports used for IPC communication between elements of the STM system LOCAL only.
In order to enable this functionality, both STM and the EMS HW Monitors MUST be shutdown. Then, the file /var/stm/config/sys/local_only needs to be created. Then STM and the EMS HW Monitors should be restarted, in that order.
In order to disable this functionality, both STM and the EMS HW Monitors MUST be shutdown. Then, the file /var/stm/config/sys/local_only needs to be removed. Then STM and the EMS HW Monitors should be restarted, in that order.
If both STM and EMS HW Monitors are not shutdown and this file is created, IPC messages will timeout and fail. This means tools will not be able to be started from the UI, monitors will not be able to send events to PSM, so the state of devices will not be updated, set_fixed will not be able to display or update the state of devices.
If this functionality is enabled, the STM product will no longer accept connections from remote systems. On new systems, the error message indicated below will be displayed. On older systems, the older message indicating diagmond may be down will be displayed. In addition, the STM product will not allow connections TO remote systems. In this case, the message below will be displayed:
MODIFIED for UI.This message will appear if the system is configured for LOCAL_ONLY and the user attempts to connect to a remote system:An unexpected error was encountered while attempting to retrieve the host info for hostname (XX). This could be due to either of the following conditions: 1) The support tool daemon "diagmond" may not be running on that system. Use the STM Startup command (in the administration menu under the file menu.) 2) The support tool daemon "diagmond" on that system may be configured to only allow local connections. Check the value of the configuration parameter LOCAL_ONLY_ENABLE in the /var/stm/config/sys/diagmond.cfg file or for the existance of the /var/stm/config/sys/local_only file on that system. 3) The diagnostic system on this system may be configured to only allow local connections. Check for the existance of the /var/stm/config/sys/local_only file on this system. The IP address for the system may be invalid or may not be associated with a valid host. Use a valid IP address. 4) Networking may be incorrectly configured on one of the systems involved. Verify networking by comparing 'nslookup `hostname`' with the output of ifconfig of the LANs identified by lanscan. More details may be available in the System Activity Log and in the syslog on that system. It may be necessary to access these using the Local Unit Under Test (UUT) logs (in the administration menu under the file menu.)OLD (will be displayed when attempt connect from remote to local_only):
An unexpected error was encountered while attempting to retrieve the host info for hostname (XX). This could be due to either of the following conditions: 1) The support tool daemon "diagmond" may not be running on that system. Use the STM Startup command (in the administration menu under the file menu.) 2) The support tool daemon "diagmond" on that system may be configured to only allow local connections. Check the value of the configuration parameter LOCAL_ONLY_ENABLE in the /var/stm/config/sys/diagmond.cfg file on that system. 3) The IP address for the system may be invalid or may not be associated with a valid host. Use a valid IP address. 4) Networking may be incorrectly configured on one of the systems involved. Verify networking by comparing 'nslookup `hostname`' with the output of ifconfig of the LANs identified by lanscan. More details may be available in the System Activity Log and in the syslog on that system. It may be necessary to access these using the Local Unit Under Test (UUT) logs (in the administration menu under the file menu.)NEW for tlpthread (memlogd) and pllib for send message:
An attempt to send an IPC message to a non-local port has been rejected. The port is on system name (XX) at IP address (XX) and has system port (XX). Possible Causes/Recommended Action: Diagnostic system is configured to only perform local IPC. To enable remote communication: Shutdown the the EMS HW Monitors and the diagnostics system Run /etc/opt/resmon/lbin/monconfig and select K)ill Run STM and select the SystemShutdown command Remove the /var/stm/config/sys/local_only file Restart the diagnostic system and the EMS HW Monitors Run STM and select the SystemStartUp command Run /etc/opt/resmon/lbin/monconfig and select E)nable Internal Application error.- Peripheral Status Monitor (PSM).
N/A- Remote Monitor (RemoteMonitor).
N/A- SCSI Card Monitor (scsi123_em).
- JAGad81201
Fix was made to generate multiple instances of events 100 (devices added) and 101 (devices removed) from the list of devices to be monitored. Currently, only one instance of the events is generated when multiple devices are added to/removed from the system.- SCSI Disk (scsi_disk).
- JAGad81201
Fix was made to generate multiple instances of events 100 (devices added) and 101 (devices removed) from the list of devices to be monitored. Currently, only one instance of the events is generated when multiple devices are added to/removed from the system.- SCSI Tape Monitor (dm_stape).
- JAGad91117
Changed the tape monitor to, by default, not poll devices. This was requested by the Greeley team, specifically Dave Ruska, due to interaction problems with the tape monitor and backup software.- System Status Monitor (sysstat_em)
- JAGad81201
Fix was made to generate multiple instances of events 100 (devices added) and 101 (devices removed) from the list of devices to be monitored. Currently, only one instance of the events is generated when multiple devices are added to/removed from the system.- JAGad67905
Also, remove .fmt file when done.- UPS Monitor (dm_ups).
- JAGad86041
Fixes a problem that dm_ups encounters when ups_mond is not running. The monitor would repeatedly log an error message to the api.log when it thought it was only logging the message once. This could potentially fill up the file system. This was fixed by eliminating this error message.- JAGad73816
Corrected a few text messages in the dm_ups message catalog, that were leveraged from the template.Changes to Platform and Interface
- Added support for the HP blade server bh3702 ("Powerbar 6U for HP-UX").
- Fixed NEC iStorage so that it is ignored by STM and the EMS HW Monitors.
- JAGad68713
Enhanced monconfig to perform error checking on the input of the client configuration file. In addition, monconfig will not allow a user to input a client configuration file if more than one monitor is selected in the Add or Modify command. This is because client configuration files are monitor specific and you cannot select one file and have it apply to multiple monitors.If the file name input is not in the correct directory, the following error will be displayed:
ERROR: File name must be in the directory /var/stm/config/tools/monitor/. Please re-enter: []If the file name input is not in the correct format, the following error message will be displayed:
ERROR: File name must be of the format *_sysstat_em.clcfg. Please re-enter: []If the file doesn't exist, the following warning will be displayed:
WARNING: File doesn't exist. Events will not be sent for this monitoring request until the file exists.- Increased number of hardware paths allowed in disabled_instances file to 1024.
Customer-Visible Interface Changes
CAUTION: UPS Monitor May Need a PatchIn some cases, the UPS monitor (dm_ups) will not function and will instead generate event 45 (formerly event 42) with the text:
Probable Cause / Recommended Action: The monitor was unable to locate the fifo pipe that should have been created by ups_mond. Therefore, information about the ups cannot be sent to the monitor. You need version (80.1.2.3) of ups_mond or greater. To update your system with the correct version of ups_mond, install one of the following patches: HPUX 10.20/s800 : PHCO_24153 (supersedes PHCO_23830) HPUX 11.00 : PHCO_24172 (supersedes PHCO_23831) HPUX 11.11 : PHCO_23832To fix the problem, load the indicated patch or load the HWE patch bundle which contains this patch. For HP-UX 11i, the ups_mond patch PHCO_23832 is also distributed on the Sept 01 OE.This problem will affect most systems with a UPS when the September 2001 diagnostics are installed. The only systems not affected will be those which are being updating from certain versions of the diagnostics (September 2000 through March 2001) and which do not have patch PHCO_19031 (HP-UX 10.20) or PHCO_19040 (HP-UX 11.00) installed.
CAUTION: Monitoring Changes for disc30, sdisk and disk array devicesAs of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec, pharaymon_exec, and psdisk_exec) handled driver error entries for these devices.
As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the diaglogd exec programs (disk30_exec.cfg, sdisk_exec.cfg, and haraymon_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.
CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems. Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:
MC/ServiceGuard A.10.10, A.11.01, A.11.03 ServiceGuard OPS Edition A.11.02, A.11.03Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.
Monitors are provided to support the following:
- AutoRAID Disk Array (armmon)
- Chassis Code Monitor (dm_chassis)
- CMC Monitor (cmc_em).
- Core Hardware (dm_core_hw)
- Core Hardware for Itanium (ia64_corehw)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fast Wide SCSI Disk Array (fw_disk_array)
- Fibre Channel Adapters (dm_FCMS_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- Kernel Resource (krmond)
- LPMC (lpmc_em)
- Memory (dm_memory)
- Remote (RemoteMonitor)
- SCSI Card (scsi123_em)
- SCSI Disk (scsi_disk)
- SCSI Tape Devices (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
In addition, the Peripheral Status Monitor (PSM) is provided to monitor the current status of the products supported by the above list.
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files