These release notes cover the May 2005 release of the Support Tools (diagnostics) for HP-UX 11.23.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
Note: No tape drives are supported by Online Diagnostics on HP-UX. Although some of the Support Tools Manager (STM) tools may function with tape drives, they are not supported. The diagnostic tools and utilities that support these devices are HP StorageWorks Library and Tape Tools (L and TT). These tools are available at http://www.hp.com/support/tapetools.
Included with the OnlineDiag bundle of support tools are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs.
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can eliminate the most undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager (STM). Once the monitoring software is installed, monitoring is automatically enabled.
By default, event messages with severity levels major, warning, serious and critical generated by the monitors will be conveyed in the following ways:
All events will be stored in /var/opt/resmon/log/event.log file.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/en/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Diagnostics section of Hewlett-Packard's online documentation Web site :
http://docs.hp.com/en/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site.
For the most current information on HP-UX 11.23 diagnostics, see the following Web pages at the Diagnostics site:
- "DIAGNOSTICS.readme for HP-UX 11.23 (May 2005)" at:
http://docs.hp.com/en/diag/st/str_1123pi_readme.htm- "Release Notes for STM on HP-UX 11.23 (May 2005)" at:
http://docs.hp.com/en/diag/stm/str_0505_1123pi.htm- "Release Notes for EMS Hardware Monitors on HP-UX 11.23 (May 2005)" at:
http://docs.hp.com/en/diag/ems/emr_0505_1123pi.htmFor 11.23, the EMS hardware monitors use version A.03.30 of the EMS platform. HP-UX 11.23 does not support the full functionality of the EMS platform. However, all EMS functionalities required by the hardware monitors are provided.
The notification method "SNMP" that can be configured (in previous releases) for EMS HW Monitors will probably NOT be available to monitors running on HP-UX 11.23 (Check the latest Web page version of the EMS Release Notes for the most current information).
Memory Page Deallocation (MPD) and the memlogd daemon are not implemented on RX 4610 systems.
Changes in the EMS Hardware Monitors for the the May 2005 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Visible Interface Changes
Note: For HP-UX 11i v2 May 2005 release, Online on PA vPar systems requires vPar version A.04.01 or higher. It will not be supported on earlier versions.
The following changes are common to both HP Integrity Servers, Intel ® Itanium ® based workstations (IA systems), and to HP 9000 Servers, RISC architecture-based workstations (PA systems) unless specifically indicated:
- JAGaf45718
Manpages for the following monitors do not include .clcfg file.This has been fixed.
- dm_TL_adapter
- dm_raid_adapter
- JAGaf36083
Some errors in the message file of the following monitors have been corrected:
- scsi123_em
- dm_ses_enclosure
The following change applies to Itanium platform only:
- JAGaf40435:
The following monitors have memory leaks.This has been fixed.
- dm_memory
- ia64_corehw
- dm_memory_azusa
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- Chassis Code Monitor (dm_chassis).
- N/A
- CMC Monitor (cmc_em).
The following changes apply to Itanium platform only:
- Monitor is updated for the current release.
- JAGaf43255
Due to a Processor Timeout error or a Hard Fail error, the monitor receives incorrect cache errors. Therefore, the monitor de-activates/de-configures the CPU on which the error is detected. This has been fixed.- JAGaf30381
Event 100643 does not provide processor-related information. Therefore, instead of the value of the processor LID, the symbol (!) is displayed in the event description. This has been fixed.- JAGaf56043
When you mark a CPU for deconfiguration or reconfiguration, the CMC monitor may unmark the CPU. This has been fixed.- Core Hardware (dm_core_hw).
- N/A
- Core Hardware for Itanium (ia64_corehw).
- N/A
- Core Hardware Monitor -- Asama (ipfcorehw_asama).
- N/A
- Core Hardware Monitor -- Hitachi (ipfcorehw_hitachi).
The following changes apply to systems running on Itanium platform only:
- Monitor is updated for the current release
- CPE Monitor (cpe_em).
The following changes apply to Itanium platform only:
- Starting IPF system firmware version rel_6.0 upgrade files (consisting of IPF system firmware versin 3.66), and firmware version 4.0 upgrade files, on rx8620 and rx7620, a new Corrected Platform Error (CPE) is added for the Page Deallocation Table (PDT) status. This new CPE applies only to the memory_ia64 monitor. However, the cpe_em monitor generates event # 100299. The cpe_em monitor is modified to ignore this new CPE. You must install the May 2005 version of the OnlineDiag bundle.
- JAGaf36335
Monitor explains the 'Error Type' field even though the bit values are not valid. This has been fixed.- JAGaf48407
Monitor leaks memory while generating an event. The monitor is enhanced to resolve this issue.- JAGaf37955
Event 100102 does not provide enough information about the error when the firmware data is processed. An additional field called Error Recovery Info is included to explain the error in detail.- CPU Monitor (lpmc_em)
The following changes apply to PA-RISC platform only:
- The floating-point detection capability of the PA CPU monitor has been enhanced.
- JAGaf40009
The PA CPU monitor has been fixed to take the correct Dynamic Processor Resilience (DPR) action on vPar systems running VirtualPartition version A.03.* (* indicates any A.03 version). DPR action is the action taken by the PA CPU monitor to replace the faulty processor with one of the spare processors (if available) on the system.- JAGaf37954
Monitor logs the wrong function name in its error messages. This has been fixed.- JAGaf38304, JAGaf41451
On vPar systems, the PA CPU monitor could de-configure or de-activate the wrong processor. The events generated by PA CPU monitor indicate a bad processor and is therefore, de-activated. But in reality, a processor with a different hard physical address is de-activated instead of the faulty one. This behaviour is more pronounced and is consistently reproducible when the system has a few de-configured processors. This problem has been fixed now. The hard physical address of the processor that is actually de-activated can be obtained using the CPU Expert Tool. The events generated by PA CPU monitor can be found at /var/opt/resmon/log/event.log. Note: The lpmc_em.dat file must be deleted for this fix to work properly. Complete the following steps:Events 100801-100813 have been replaced by Events 100901-100913 respectively. The new events have the following attributes by default:
- Bring down EMS monitors using the "monconfig" utility.
- Remove the file "/var/stm/logs/monitor/lpmc_em.dat" using the following command: rm /var/stm/logs/monitor/lpmc_em.dat.
- Enable monitoring using monconfig utility.
Threshold = 1
time_window = ANY i.e., no time frame
Suppression time = NOT_USED i.e., no suppression time.- Monitor has been enhanced to support A.04.01 vPars.
- CPU Monitor -- Hitachi (cmc_em_hitachi).
- N/A
- Disk Array FC60 Monitor (fc60mon).
- N/A
- Disk Monitor (disk_em).
The following changes apply to both Itanium and PA-RISC platforms:
- JAGaf37354
Monitor has a memory leak. This has been fixed.- Monitor is updated for the current release.
- Fibre Channel Adapter (ql_adapter)
- N/A
- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- N/A
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
- N/A
- Fibre Channel Switch (dm_fc_sw).
- N/A
- Forward Progress Log (FPL) Monitor (fpl_em)
The following changes apply to both Itanium and PA-RISC platforms:
- Monitor is enhanced to handle 02 events.
- JAGae61292 JAGaf44894
Performance of the monitor is enhanced.- JAGaf48742
Monitor does not select the init log file correctly. Monitor is enhanced to handle the missing status file- JAGaf40434
Monitor is enhanced to resolve the memory leakage issue.- High Availability Disk Array Monitor (ha_disk_array).
- N/A
- High Availability Storage System (dm_ses_enclosure)
The following changes apply to both Itanium and PA-RISC platforms:
- JAGaf48688
EMS Reports DS2405 firmware mismatch as a hardware failure.- iSCSI Driver Subsystem Monitor (dm_iscsi_adapter)
- N/A
- Kernel Resource Monitor (krmond)
- N/A
- Memory (dm_memory)
The following changes apply to PA-RISC platform only:
- JAGaf23521
Event 1400 is generated by the memory monitor when the Page De-allocation Table (PDT) is 100% full. The severity of the event is 'Critical'. This event is generated once is 24hrs.- JAGaf30728
Event 3100 to Event 3300 are disabled by default. However, these events can be enabled by the user. These events are generated when the number of single bit errors at the same address exceeds the specified limit. However, there is no time frame for exceeding this limit. Event 4000 to Event 4200 are generated when the number of "unique" memory addresses on the same DIMM meets the specified threshold within the specified time frame.- The severity level of Event 3100, Event 3200, and Event 3300 (events to cover correctable memory errors at the same address and same DIMM) is changed to Information and will be DISABLED by default in the default_dm_memory.clcfg file
- JAGaf48614
MC/EXT field in the events generated by the monitor monitoring rp4440, rp3440 and c8000 class of machines is blank. This has been fixed.- JAGaf53831
The dm_memory monitor does not generate Event 4000 to Event 4200 after the PDTentries are cleared. The events are generated beforeclearing the PDT. The clearing of PDT causes memory error log to bedeleted but it does not clear the monitor's history information. If the history is not cleared the suppression time specified along with the event will not allow the generation of another event within 24 hrs. Memlogd has been fixed to clear the monitor's event logs every time the memory error log is cleared.- Memory IA64 (memory_ia64)
The following changes apply to Itanium platform only:
- Based on the memory error handling suggested by HP, the severity level of Event 3100, Event 3200 and Event 3300 in the IPF Memory Monitor has been changed to Information. These events are disabled by default in the default_memory_ia64.clcfg file. In addition, the default threshold has been changed from 40 to 20 for Event 4000 and from 60 to 50 for Event 4100 in the default_memory_ia64.clcfg file.
- Monitor has been enhanced to support Virtual Partitions (vPars). In any active vPar which has OnlineDiag installed, the monitor will monitor the memory allocated to that vPar and the memory allocated to the monitor space of the vPar.
The following changes apply to both Itanium and PA-RISC platforms:
- apiLog content has been cleaned up and enhanced to allow control of content level.
- JAGaf51587
Event 102000 is generated even though it is non-existent. This has been fixed.- JAGaf51588
Event 103011 is incorrectly suppressed after a short period by the monitor. This is because the power supply is not detected. This has been fixed.- Monitor has been enhanced to generate an FPL event. This event indicates that the FPL entries are missing.
- JAGaf25773
Monitor generates events incorrectly for errors that are already resolved. This is due to problems related to the power supply. The power supply has been rectified.- JAGaf39091
For Event 101011, the current states reflected in the summary and probable cause/recommended action text do not match. It is clarified that the system has reached a non-critical temperature condition.- JAGaf40433
While processing entries from the os_decode_xref file, the monitor has a memory leak. This has been fixed.- JAGaf42886
The monitor generates incorrect temperature events of high severity. These events are generated when system temperature changes from a very severe state to a less severe state. For example, when the system changes from a non-critical to a normal state, an event for a non-critical temperature generates. The monitor now generates Event 101001, Event 101002, Event 101003, Event 101011, Event 101012, and Event 101013 under correct conditions.- JAGaf55382
ia64_corehw monitor is enhanced to improve FPL processing performance.- JAGaf55932
Memory leak occurs in the ia64_corehw monitor while retrieving the monitor configuration values. This has been fixed.- Memory Monitor -- Hitachi (ipfmemory_hitachi).
- N/A
- MSA1000 Storage Disk Array Monitor (msamon)
The following changes apply to both Itanium and PA-RISC platforms:
- JAGaf48231
Monitoring support is added for MSA1500 under the existing HP Storage Works Modular SAN array 1000 / 30 Monitor (msamon).- STM identifies MSA1500 modular storage array and Enterprise Virtual Array - XL.
- Peripheral Status Monitor (psmmon).
- N/A
- RAID Adapter (dm_raid_adapter)
- N/A
- Remote Monitor (RemoteMonitor).
- N/A
- SCSI Disk Monitor (scsi_disk).
- N/A
- System Status Monitor (sysstat_em)
The following changes apply to both Itanium and PA-RISC platforms:
- JAGaf05354
When sysstat_em starts, the monitor does not log a message in the api.log (/var/opt/resmon/log/api.log) indicating that the monitor has started. This has been fixed.- JAGae57005
Monitor does not display the IP address of the machine in the Component Data section of the event. Also, if the IP address is specified in the configuration file, the host name is not retrieved and displayed in the Component Data section. This has been fixed.- UPS Monitor (dm_ups).
- N/A
Changes to Platform and Interface
The following changes apply to Itanium platform only:
- CPE monitor (Hitachi) is included in the current release.
Customer-Visible Interface Changes
- N/A
- If the SysFaultMgmt product is installed on your system, you must take the following steps, before and after updating the OnlineDiag product:
- Shut down the SysFaultMgmt subsystem via the command line:
/sbin/init.d/cimserver stop- Perform the OnlineDiag update.
- After the update is completed, restart the SysFaultMgmt subsystem via the command line:
/sbin/init.d/cimserver startNOTE: To disable monitoring during normal system operation, you must use steps 1 and 3, above, to avoid an automatic restart of monitoring; step 2 is the action you need to perform while monitoring is turned off.
Future updates to SysFaultMgmt and OnlineDiag will resolve this problem.- The Memory Page Deallocation (MPD), which runs on most current HP-UX computer systems, does not work on RX 4610 systems. The activity log for memlogd includes a message that reads unsupported device.
MPD cannot be implemented on the RX 4610 system because the system's design does not allow the memlogd daemon to run on it.- In HP-UX 11.23 June 2003 release, the Fibre Channel Arbitrated Loop Hub monitor (dm_fc_hub) and the Fibre Channel Switch monitor (dm_fc_sw) are probably not functional. This is because these monitors depend on SNMP functionality which may not be included in that release. Check the latest version of the EMS Release Notes on the Web for the latest information.
- If the maxssiz_64bit kernel parameter is set below the default value of 0x800000, it can cause the lpmc_em monitor to abort.
For the May 2005 release of HP-UX 11.23, the following monitors are scheduled to be available:The following monitors are NOT provided:
- CMC Monitor (cmc_em).
- Core Hardware (dm_core_hw)
- Core Hardware for Itanium (ia64_corehw)
- Core Hardware Monitor -- Asama (ipfcorehw_asama)
- Core Hardware Monitor -- Hitachi (ipfcorehw_hitachi)
- CPE Monitor (cpe_em)
- CPU Monitor (lpmc_em)
- CPU Monitor -- Hitachi (cmc_em_hitachi)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fibre Channel Adapter (ql_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- Forward Progress Log (FPL) Monitor (fpl_em)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- iSCSI Driver Subsystem Monitor (dm_iscsi_adapter)
- Kernel Resource Monitor (krmond)
- Memory (dm_memory)
- Memory IA64 (memory_ia64)
- Memory Monitor -- Hitachi (ipfmemory_hitachi)
- MSA1000 Storage Disk Array Monitor (msamon)
- Peripheral Status Monitor (psmmon)
- RAID Adapter (dm_raid_adapter)
- Remote Monitor (RemoteMonitor)
- SCSI Disk Monitor (scsi_disk)
- System Status (sysstat_em)
- UPS (dm_ups)
- dm_FCMS_adapter
- fw_disk_array: hardware not supported on the system
- scsi123_em: hardware not supported on the system
For detailed information about the products and the monitors supporting them, and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation Web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. They are as follows:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub (dm_fc_hub) monitor and the Fibre Channel Switch (dm_fc_sw) monitor require a special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to run the Disk Array FC60 (fc60mon) monitor or the STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in Chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files