These release notes cover the December 2004 release of Support Plus for HP-UX 11i.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
Note: The HP StorageWorks SDLT 160/320 GB Tape Drive and HP Ultrium 460 External Tape Drive are supported by Online Diagnostics on HP-UX. Although some of the Support Tools Manager (STM) tools may function with tape drives, they are not supported. The diagnostic tools and utilities that support these devices are HP StorageWorks Library and Tape Tools (L and TT). These tools are available at
http://www.hp.com/support/tapetools.Included on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 11i and 11.00 (IPR 9902 and later).
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/ .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics": http://docs.hp.com/hpux/diag/
At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.
For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.
Changes in the EMS Hardware Monitors for the the December 2004 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Visible Interface Changes
- JAGaf40310
The monitor code was causing memory leaks during the process of monitor initialization. This was discovered as a part of JAGaf37354.
These memory leaks existed for all monitors which are based on the monitor template. Appropriate changes were made in dm_ses_enclosure, ha_disk_array, RemoteMonitor, msamon, and scsi_tape monitors, to avoid the memory leaks.- JAGaf18401
Manpages for the following monitors now include the .clcfg (Client Configuration) files:
- dm_chassis
- dm_ups
- sysstat_em
- dm_memory
- lpmc_em
- dm_fc_sw
- dm_core_hw
- dm_fc_scsi_mux
- scsi123_em
- JAGaf34874
Manpages for the following monitors now include the .clcfg path.
- msamon
- dm_ses_enclosure
- dm_stape
- fc60mon
- disk_em
- JAGaf12089
aplsrv IDs were too long and ambiguous in some cases. Those IDs are modified.- JAGaf18375
Manpages for the following monitors do not include the .clcfg files:
- dm_iscsi_adapter
- dm_ql_adapter
- JAGaf40312
The following monitors have memory leaks:
- dm_TL_adapter
- dm_ql_adapter
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- AutoRAID Disk Array (armmon).
- N/A
- Chassis Code Monitor (dm_chassis).
- JAGaf41849
The chassis code FABRIC_READ_ERROR is not reported by EMS.- Core Hardware Monitor (dm_core_hw)
- JAGaf22356
Events 69, 70, 71 of the dm_core_hw EMS monitor was printing a warning in the event text, even if everything was correct. This behaviour has been corrected.- JAGaf14904
In events 37 and 38 of the dm_core_hw EMS monitor, "partition" was mispelled as "partiton". This has now been corrected.- Chassis Event Monitor (ia64_corehw).
- JAGaf04964
Multiple child processes are no longer created. A functionality has been added to clean up those child processes that do not terminate regularly. A 'ps' command now displays only 2 ia64_corehw processes. The system performance is no longer impacted significantly when the monitor is executing.- JAGaf05703
Instead of Event 101001 and Event 101002, Event 101011 and Event 101012 are now generated on HP systems. The events are rephrased to indicate that the action taken is identical to what is specified in the envd configuration file.- JAGaf40433 JAGaf39091
For Event 101011, the current states reflected in the summary and probable cause/recommended action text do not match. It is clarified that the system has reached a non-critical temperature condition.- JAGaf18393
The manpage now includes descriptions of the ia64_corehw.cfg and default_ia64_corehw.clcfg files.- JAGaf25773
Duplicate events are no longer generated when Intelligent Platform Management Interface (IPMI) problems are encountered.- JAGaf29389
The ia64_corehw.dict file specifies that the monitor runs on both IA-64 and PA systems, and that it gathers the input required for the fpl_em monitor.- JAGaf42886
The monitor generates incorrect temperature events of high severity. These events are generated when system temperature changes from a very severe state to a less severe state. For example, when the system changes from a non-critical to a normal state, an event for a non-critical temperature generates.- JAGae90398
The e-mail message from EMS says Baseboard Management Controller (BMC) clock is not initialized. This is fixed.- JAGae95681
Information that must be preserved when the monitor is updated is no longer deleted. This prevents the occurence of duplicate events.- JAGae99136
Non-cellular systems are no longer identified as cellular systems and events are generated.- CPU Monitor (lpmc_em).
Note: Starting from June 2002 release, the LPMC Monitor (lpmc_em) was renamed to "CPU Monitor". The binary name is still lpmc_em. The name was changed to reflect the monitor's enhancement to check floating-point functionality in the CPU.
- JAGaf37954
lpmc_em used to log the wrong Function Name in its error messages. This problem has been fixed.- JAGaf40009
The PA CPU monitor was fixed to take the correct Dynamic Processor Resilience (DRP) action on vPar systems running VirtualPartition version A.03.* (* here means any version of A.03). DPR action is the action taken by a PA CPU monitor to replace the faulty processor with one of the spare processors (if available) on the system.
The command "/usr/sbin/swlist -l product VirtualPartition" can be used to find out the version of vPar running on the system.- The floating point detection capability of the PA CPU monitor has been enhanced.
- JAGaf41451
On vPar systems, the PA CPU monitor could de-configure or de-activate the wrong processor. The events generated by PA CPU monitor indicate a bad processor and is therefore, de-activated. But in reality, a processor with a different hpa is de-activated instead of the faulty one. This behaviour is more pronounced and is consistently reproducible when the system has a few de-configured processors. This problem has been fixed now. The hpa of the processor that is actually de-activated can be obtained using the CPU Expert Tool. The events generated by PA CPU monitor can be found at /var/opt/resmon/log/event.log.
Note: The lpmc_em.dat file must be deleted for this fix to work properly. Complete the following steps:
- Bring down EMS monitors using the "monconfig" utility.
- Remove the file "/var/stm/logs/monitor/lpmc_em.dat", using the following command: rm /var/stm/logs/monitor/lpmc_em.dat.
- Enable monitoring using monconfig utility.
- JAGae99200
The incorrect description of the Floating Point(FP) error event was corrected. Prior to the implementation of this fix, the event Summary of FP error event 100701 said "Cache Error(s) detected", instead of "Floating point test failed". This behaviour has been corrected.
Events 100801-100813 have been replaced by Events 100901-100913 respectively.
The new events have the following attributes by default:
Threshold = 1
time_window = ANY i.e., no time window.
Suppression time = NOT_USED i.e., no suppression time.- JAGaf38304
The monitor incorrectly reports that PA8800 processors failed.- Disk Array FC60 Monitor (fc60mon).
- The severity of Event 3 has been modified from informational to critical, and the suppression time has been increased from 1 day to 7 days.
- JAGaf35652
Event 4 was generated incorrectly.- Fast Wide SCSI Disk Array (fw_disk_array)
- N/A
- Disk Monitor (disk_em).
- JAGaf37354
A memory leak in disk_em has been fixed.- JAGaf34507
The disk_em monitor now logs the hardware path information, along with device file name, in the /etc/opt/resmon/log/api.log file, if open system call fails on that device file.- JAGaf34106
disk_em has been fixed- JAGaf30446
In some cases, SCSI sense data reporting is more specific for sense key 0x06.- Fibre Channel Adapters (dm_FCMS_adapter).
- N/A
- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- N/A
- Fibre Channel Adapter Model A6826A Monitor (dm_QL_adapter)
- JAGaf42057
HP-UX driver for a Fibre Channel dual port HBA (fcd driver) event must be added.- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
- N/A
- Fibre Channel Switch (dm_fc_sw).
- N/A
- Forward Progress Log Monitor (fpl_em)
- JAGaf20670
The monitor was incorrectly logging a set of error messages to the /var/opt/resmon/log/api.log file. The monitor continued to function properly after logging these messages. This problem has been fixed. The monitor does not log faulty error messages to the api.log file.- JAGaf32324
The fpl_em monitor has been modfied to interface with the new tlcomev library, but it will ignore all type 0x02 IPMI events.- JAGaf40434
fpl_em monitor has a fairly minor memory leakage when it is initialized.- High Availability Disk Array Monitor (ha_disk_array).
- N/A
- High Availability Storage System (dm_ses_enclosure)
- JAGaf50955
dm_ses_enclosure aborts if DS2300 is set to SAF-TE mode. SAF-TE Mode is not supported on HP-UX for DS2300. Always set the mode to SES while connecting to a system running on HP-UX.- iSCSI Device Adapter (dm_iscsi_adapter)
- JAGaf22601
Event messages generated by the monitor need to be modified.- JAGaf41866
Messages added to iscsi driver are also added to dm_iscsi_adapter.- Kernel Resource Monitor (krmond)
- N/A
- LPMC Monitor (lpmc_em).
Starting from June 2002 release, the LPMC Monitor (lpmc_em) was renamed to "CPU Monitor". The binary name is still lpmc_em. The name was changed to reflect the monitor's enhancement to check floating-point functionality in the CPU. For more information, see CPU Monitor (lpmc_em).- Memory Monitor (dm_memory).
- JAGaf23521
Event 1400 is generated by the memory monitor when the Page De-allocation Table (PDT) is 100% full. The severity of the event is 'Critical'. This event is generated once is 24hrs.- JAGaf30728
- Event 3100 to Event 3300 are disabled, by default. However, these events can be enabled by the user. These events are generated when the number of single bit errors at the same address exceeds the specified limit. However, there is no time frame for exceeding this limit.
- Event 4000 to Event 4200 are generated when the number of "unique" memory addresses on the same DIMM meets the specified threshold within the specified time frame.
- Peripheral Status Monitor (PSM/psmmon).
- N/A
- MSA 1000 Storage Disk Array (msamon).
- N/A
- Remote Monitor (RemoteMonitor).
- N/A
- SCSI Card Monitor (scsi123_em).
- N/A
- SCSI Cascade Monitor (scsi_cascade)
- N/A
- SCSI Disk (scsi_disk)
- N/A
- SCSI Tape Monitor (dm_stape)
- N/A
- System Status Monitor (sysstat_em).
- JAGaf05354
When the sysstat_em was started, it did not log a message in the api.log (/var/opt/resmon/log/api.log), indicating that the monitor had started.- JAGae57005
sysstat_em was not displaying the IP address of the machine in the Component data portion of the event text. This has been fixed.Also, if the IP address was specified in the cfg file, then there was no attempt made to retrieve the host name, and display it in the Component Data portion. This, too, has been fixed.
- JAGae59420
sysstat_em was not deleting a temporary file in the /var/tmp directory. (/var/tmp/sysstat_em.fmt). Sysstat_em has now been fixed to delete this file after event generation.- UPS Monitor (dm_ups).
- N/A
Changes to Platform and Interface
- JAGae92115
This pertains to monconfig.The disabled_instances file is now delivered to /usr/newconfig/var/stm/data/tools/monitor, and only overwrites /var/stm/data/tools/monitor/disabled_instances if there have been no changes made to /var/stm/data/tools/monitor/disabled_instances. (The disabled_instances file is used to specify the list of resource names which should not be monitored).
- JAGaf32486
The monconfig utility displayed the following error, when monitoring was disabled/killed:____ERROR__________ usage: grep [-E|-F] [-c|-l|-q] [-bhinsvx] -e pattern_list... [-f pattern_file...] [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvx] [-e pattern_list...] -f pattern_file... [file...] usage: grep [-E|-F] [-c|-l|-q] [-bhinsvx] pattern [file...] ____________ERROR______________Now the problem has been fixed, so that such errors are not shown.
- JAGaf30394
The send_test_event test utility was not finding the monitor name in the .sapcfg file associated with the monitor, when more than 256 entries were found across all sapcfg files, taken together. The following error was being flashed:send_test_event: Failed to find monitor name in sapcfg files.The send_test_event test utility has now been fixed to operate as expected, when more than 256 entries across all sapcfg files, taken together, are found. The send_test_event test utility will now support up to 8192 entries across all sapcfg files, taken together, as a result of this fix.
- JAGaf23077
monconfig was corrupting some Monitoring Requests, while Adding/Modifing Monitoring Requests, and when more than 256 entries were found across all sapcfg files, taken together.monconfig has now been fixed to allow Monitoring Requests to be Added/Modified to have 4092 entries across all sapcfg files, taken together.
The following error will be flashed once there is no more space available to accommodate the monitoring entries:
ERROR: Monconfig cannot add this Monitoring Request. Buffer holding Monitoring Entries has no more space. Delete or modify existing Monitoring Request(s) in order to add this Monitoring Request successfully.- JAGaf22321
While updating the monitor's .cfg files for configuration verbs, users would see the following error in the api.log file:-------------------Start Event-------------------- User event occurred at Mon May 3 17:37:55.709202 2004 Process ID: 5520 (/usr/sbin/stm/uut/bin/.../memory_ia64) Log Level: Error The event (100140) specified on DEFINE EVENT verb in the monitor's configuration file (/var/stm/config/tools/monitor/memory_ia64.cfg) or the Global configuration file could not be configured because of an internal memory error. Possible Causes/Recommended Action: A maximum of 1000 events can be configured for a monitor. Internal Application error. -------------------End Event----------------------Any changes made to configuration verbs are NOT effective, when this problem occurs. This error was flashed, even if the user had less than 1000 events configured for a monitor. The problem has now been corrected.
Customer-Visible Interface Changes
- N/A
CAUTION: UPS Monitor May Need a PatchIn some cases, the UPS monitor (dm_ups) will not function and will instead generate event 45 (formerly event 42) with the text:
Probable Cause / Recommended Action: The monitor was unable to locate the fifo pipe that should have been created by ups_mond. Therefore, information about the ups cannot be sent to the monitor. You need version (80.1.2.3) of ups_mond or greater. To update your system with the correct version of ups_mond, install the following patch: HPUX 11.11 : PHCO_23832To fix the problem, load the indicated patch or load the HWE patch bundle which contains this patch. For HP-UX 11i, the ups_mond patch PHCO_23832 is also distributed on the Sept 01 OE.This problem will affect most systems with a UPS when the September 2001 diagnostics are installed. The only systems not affected will be those which are being updating from certain versions of the diagnostics (September 2000 through March 2001) and which do not have patch PHCO_19040 (HP-UX 11.00) installed.
CAUTION: Monitoring Changes for disc30, sdisk and disk array devicesAs of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec and psdisk_exec) handled driver error entries for these devices.
As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the diaglogd exec programs (disk30_exec.cfg and sdisk_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.
CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems. Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:
MC/ServiceGuard A.10.10, A.11.01, A.11.03 ServiceGuard OPS Edition A.11.02, A.11.03Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.
If the maxssiz_64bit kernel parameter is set below the default value of 0x800000, it can cause the lpmc_em monitor to abort.
Monitors are provided to support the following:
- AutoRAID Disk Array (armmon)
- Chassis Code Monitor (dm_chassis)
- Core Hardware (dm_core_hw)
- Chassis Event Monitor (ia64_corehw)
- CPU (lpmc_em)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fast Wide SCSI Disk Array (fw_disk_array)
- Fibre Channel Adapters (dm_FCMS_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Adapter Model A6826A Monitor (dm_QL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- Forward Progress Log Monitor (fpl_em)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- iSCSI Device Adapter (dm_iscsi_adapter)
- Kernel Resource (krmond)
- LPMC (lpmc_em) renamed to "CPU Monitor" as of the June 02 release
- Memory (dm_memory)
- MSA 1000 Storage Disk Array (msamon)
- Remote (RemoteMonitor)
- SCSI Card (scsi123_em)
- SCSI Cascade (scsi_cascade)
- SCSI Disk (scsi_disk)
- SCSI Tape Monitor (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
In addition, the Peripheral Status Monitor (PSM) is provided to monitor the current status of the products supported by the above list.
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files