These release notes cover the June 2002 release of Support Plus for HP-UX 11i/11.00/10.20 running on S800/S700 systems.
- Overview
- Configuring Hardware Monitoring
- Documentation
- Changes
- Known Problems
- Monitors Provided
- Monitor Dependencies
- Defect Reporting
- SD Product Structure
NOTE: As of the September 1999 release, the name of the Diagnostic/IPR Media has been changed to Support Plus. In addition, the format has changed so that there is a separate CD-ROM for each version of the operating system (HP-UX 11i, 110.00 and 10.20).
Included on the Support Plus CD-ROM are the EMS Hardware Monitors - an important tool for maintaining system availability. The EMS hardware monitors allow you to monitor the operation of a wide variety of hardware products and be alerted immediately if any failure or other unusual event occurs. Hardware event monitoring is available to users running HP-UX 11i, 11.00, or 10.20 (IPR 9902 and later).
Hardware event monitoring provides a high level of protection against system hardware failure. By using hardware event monitoring, you can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss.
Configuring Hardware Monitoring
The EMS Hardware Monitors are installed at the same time as the Support Tools Manager. Once the monitoring software is installed, monitoring is automatically enabled.
By default, messages regarding major warning, serious and critical events that occur on hardware being monitored will be:
All events will be stored in /var/opt/resmon/log/event.log.
- Written to /var/adm/syslog/syslog.log
- Sent to EMAIL address root
To configure, enable, or disable hardware event monitoring, run the monitoring request manager: /etc/opt/resmon/lbin/monconfig .
The Peripheral Status Monitor (PSM) and the The Kernel Resource Monitor (krmond) are configured differently. They use the EMS GUI. See: http://docs.hp.com/hpux/onlinedocs/diag/ems/ems_gui.htm
For the latest and most complete information on EMS Hardware Monitors and the Support Tools Manager (STM), see the Web page "Diagnostics":
http://docs.hp.com/hpux/diag/At this site, you will find Overviews, Tutorials, Quick Reference Cards, Frequently Asked Questions (FAQs), and much other material.For complete information on installing and using EMS hardware monitors, as well as a list of supported hardware, refer to the "EMS Hardware Monitors User's Guide" available at the above site. An electronic copy of this book is also included on the Support Plus CD-ROM in the <mount_point>/DIAGNOSTICS directory.
Changes in the EMS Hardware Monitors for the the June 2002 release include:
- Changes to Multiple Monitors
- Changes to Individual Monitors
- Changes to Platform and Interface
- Customer-Vi sible Interface Changes
- There is a new evlib routine, that will remove all iscsi related devices in list of device paths. The SCSI-related monitors which use this routine are:
If any of these tools are monitoring a SCSI device that is a descendant of an iscsi virtual node, then that device will be removed from that monitor's pathlist of resources, and will no longer be monitored.
- armmon
- disk_em
- dm_fc_scsi_mux
- dm_ses_enclosure
- dm_stape
- fc60mon
- fw_disk_array
- ha_disk_array
Changes to Individual Monitors
Changes to each monitor are described below. (Monitors are listed in alphabetical order.)
- AutoRAID Disk Array (armmon).
- JAGad51518; JAGad75425; JAGad73813; JAGad58875
The following JAGS have been fixed in this armmon release:
- JAGad51518 -- armmoncfg.clcfg file contains wrong information
- JAGad75425 -- Template versions of monitor should eventually be used on all OS
- JAGad73813 -- The catalog for armmon and fc60mon refer to SCSI-TEMPLATE
- JAGad58875 -- EMS monitor fc60mon and armmon are stated even if no array is connected
- Chassis Code Monitor (dm_chassis).
- JAGae11962
1. Fixed JAGae11962 (chassis code event flooding). 2. Exported the new chassis code library and imported it into UNIX. This means tlchassis.sl and tlchassis.msg will have the same information as in the new chassis code database on NT.- JAGae05501
The problem with specifically listing the default_dm_ups.clcfg default client configuration file, especially for the "standard" monitoring requests used by ALL monitors, is that monconfig considers these requests different from the ones from other monitors that do not list the default client configuration file. This makes the listing of the monitoring requests messy in the monconfig output AND makes it difficult for users to modify the standard requests, as there are now the ones for those that do not list the default client configuration file, plus one for each monitor that does list its specific client configuration file. In addition, the "standard" requests can no longer just say it applies to ALL monitors, but must list each monitor separately. JAGad05363 is related to this problem and applies to the fix that will be made in monconfig itself to work around this problem.- CMC Monitor (cmc_em).
- N/A
- Core Hardware Monitor (dm_core_hw)
- JAGae09320
Added support for the missing MARCATO_W_DC_MINUS hversion for EMS tools.- Core Hardware for Itanium (ia64_corehw).
- N/A
- CPU Monitor (lpmc_em).
Note: As of the June 2002 release, the LPMC Monitor (lpmc_em) was renamed to "CPU Monitor". The binary name is still lpmc_em. The name was changed to reflect the monitor's enhancement to check floating-point functionality in the CPU.
- The CPU monitor has been enhanced for this release:
- Triggering of the Dyamic Processor Resilience (DPR) action has been improved. For detailed information, see the white paper Dynamic Processor Deallocation and Dynamic Processor Resilience available on this web site. "
There are four types of Cache errors - ICache Data, ICache Tag, DCache Data, and DCache Tag. Currently, the monitor will take the DPR action when the total Threshold number of these errors occur, in any combination. Starting with HWE0206 release, the monitor will bucket these errors, and the DPR will kick in when the Threshold number of the SAME type of error occurs. For example, if the Threshold is 3:
Pre-HWE0206 version: 2 ICacheData + 1 DCacheData will trigger DPR HWE0206 version: 3 ICacheData (or same type of errors) are needed to trigger the DPRTo accommodate for these new cases, the events are renumbered, since the event text had to be modified to indicate which of the four types of Cache error occurred. Events 100501-100599 will now be obsolete. The new event numbers range from 100601-100699.- Support has been added to perform tests on the Floating-Point registers on all CPUs at each POLL interval.
At each POLL_INTERVAL, the monitor will run about 20K test-vectors (of 1M) Floating-Point tests on each processor - one at a time. The number of test-vectors to be run each time is configurable by changing the value of FP_TEST_ITERATIONS in the lpmc_em.cfg file. Note that the value of zero (0) will mean that the monitor should not run the FP tests. If the test fails on any of the processors, the monitor will then proceed to take Dynamic Processor Resilience action on it and generate appropriate events.
Since there can be two causes of DPR, the Event Description of events 100621-100632 has to be changed slightly - so as NOT to refer to Cache errors and/or FP errors specifically. The event notification for either DPR action is now made of two events:
- First event: says that an error occurred on the processor. For the cache errors, events 100611- 100614 will be generated. For FP test failures, event 100701 will be generated.
- Second Event: will inform the user of the DPR action taken by the monitor. This will be indicated by one of the events in the range of 100621-100632.
- Disk Array FC60 Monitor (fc60mon).
- JAGad51518; JAGad75425; JAGad73813; JAGad58875 The following JAGS have been fixed in this release:
- JAGad51518 -- armmoncfg.clcfg file contains wrong information
- JAGad75425 -- Template versions of monitor should eventually be used on all OS
- JAGad73813 -- The catalog for armmon and fc60mon refer to SCSI-TEMPLATE
- JAGad58875 -- EMS monitor fc60mon and armmon are stated even if no array is connected
- Fast Wide SCSI Disk Array (fw_disk_array)
- JAGae05960
This is a fix for JAGae05960 (fw_disk_array monitor could fill up file system).- Disk Monitor (disk_em).
- JAGae11549; JAGad96956
Monitor now supports PCI suspend/resume functionality; that is, monitor should not generate events of devices which are connected to PCI card when it is in suspended mode (to stop monitor from getting info from raid 4si disks).- Fibre Channel Adapters (dm_FCMS_adapter).
- N/A
- Fibre Channel Adapter Model A5158 Monitor (dm_TL_adapter).
- N/A
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux).
- N/A
- Fibre Channel Switch (dm_fc_sw).
- N/A
- High Availability Disk Array Monitor (ha_disk_array) .
- JAGae11549
Fix for the situation where whenever the PCI card is suspended, the ha_disk_array monitor will not generate events for the devices connected for the suspended PCI card.- JAGae05958
This is a fix for JAGae05958 (ha_disk_array can potentially fill up the file system).- High Availability Storage System (dm_ses_enclosure)
- JAGae11212
Changes were made to support DS2300 and DS2405 devices.- JAGae11549
Fix for the situation where whenever the PCI card is suspended, the dm_ses_enclosure monitor will not generate events for the devices connected for the suspended PCI card.- JAGae15588
Info tool for Apex was exiting with SIGSEGV. This problem was due to a bug in tlses library. This defect is fixed for this release.- JAGae05962
This is a fix for JAGae05962 (dm_ses_enclosure monitor could fill up file system).- Kernel Resource Monitor (krmond)
- N/A
- LPMC Monitor (lpmc_em).
As of the June 2002 release, the LPMC Monitor (lpmc_em) was renamed to "CPU Monitor". The binary name is still lpmc_em. The name was changed to reflect the monitor's enhancement to check floating-point functionality in the CPU. For more information, see CPU Monitor (lpmc_em).- Memory Monitor (dm_memory).
- Following a fix in the scsi template monitor: Changed code in the memory monitor to close the file before removing it (this change is a result of a fix to JAGab67905).
- Peripheral Status Monitor (PSM).
- N/A
- Remote Monitor (RemoteMonitor).
- Added event numbers 215 and 258 to the default_RemoteMonitor.clcfg file.
- SCSI Card Monitor (scsi123_em).
- N/A
- SCSI Cascade Monitor (scsi_cascade).
- JAGae11552
As we are presently not supporting iscsi devices, hardware paths of devices which are connected to the iscsi card were removed from the hardware path list.- JAGae11549; JAGae11552
Monitor now supports PCI suspend/resume functionality; that is, the monitor should not generate an event for the devices which are connected to the PCI card, when the card is in suspended mode. A fix has also been implemented to remove the hardware paths for devices which are connected to the iscsi card from the hardware path list.- SCSI Disk (scsi_disk).
- JAGad96956
A fix was made to stop the monitor from getting info from raid 4si disks.- SCSI Tape Monitor (dm_stape).
- It has been discovered that some monitors may not need any code modifications to remove iscsi devices from the monitor path list. To determine if changes to your monitor are necessary, you must know if your monitor is predictive enbled. To ascertain this:
If your monitor IS predictive enabled, then no changes to remove iscsi devices are necessary. They are ignored in the uut_status, and therefore, are not entered in the pathlist by diagmond. Remove any changes you had already made to ignore iscsi devices.
- vi main_prog.c
- search for "init_reg_and_paths()" (if found, then monitor IS predictive enabled; if not found, then monitor is NOT predictive enabled).
- System Status Monitor (sysstat_em)
- N/A
- UPS Monitor (dm_ups).
- JAGae16583
In previous releases, when a customer has desired to utilize the ups monitoring functionality, but the ups monitor (dm_ups) failed to start due to an incompatible version of ups_mond (HPUX monitoring daemon), the customer has been unclear about what version of ups_mond to install. In this situation, the ups monitor will generate event #42 that informs the user that the system must be updated with a more recent version of ups_mond. The message references versions of ups_mond that are valid on some OSs and invalid on others, along with old patch numbers that leave the customer confused.This version of dm_ups contains new cause/action text for event #42 that informs the user of the most current ups_mond patch to install for each supported OS. It makes no reference to a version number of ups_mond, since it really doesn't matter and only creates confusion.
- JAGae14473
If the dm_ups EMS monitor is running and the /etc/ups_conf file contains a "upstty" entry that does not end in ":SOLA", then dm_ups will erroneously generate event #43. This is a bug which is now fixed.- JAGae05500
The problem with specifically listing the default_dm_ups.clcfg default client configuration file, especially for the "standard" monitoring requests used by ALL monitors is that monconfig considers these requests different from the ones from other monitors that do not list the default client configuration file. This makes the listing of the monitoring requests messy in the monconfig output AND makes it difficult for users to modify the standard requests, as there are now the one for those that do not list the default client configuration file, plus one for each monitor that does list its specific client configuration file. In addition, the "standard" requests can no longer just say it applies to ALL monitors, but must list each monitor separately. JAGad05363 is related to this problem and applies to the fix that will be made in monconfig itself to work around this problem.- JAGae05950; JAGab67905
Customers have complained about the temporary format file named /var/tmp/dm_ups.fmt file filling up their disk. This version of dm_ups will remove the dm_ups.fmt file after each event is formatted and logged.Changes to Platform and Interface
- N/A
Customer-Visible Interface Changes
- JAGad99498
Modified moncheck, toggle_switch and startmon_client, used by monconfig to check monitoring requests, disable monitoring and enable monitoring respectively and psmctd to use the new rm_service_up routine in the EMS library to check if the EMS services are available prior to attempting to connect to EMS. The code will display an error message to the user indicating the registrar service has not been started if the 5 minutes expires. "EMS Registrar inetd service not started. Start registrar and retry". This message will be displayed in monconfig when the user selects the K)ill or C)heck command. It is not displayed for an E)nable command as there is no communication between monconfig and the program that actually enables the monitors. However, if the service is not available and the user selects E)nable, the command will complete but the monitors will NOT be enabled and the state displayed in monconfig will indicate monitors are NOT enabled. No errors can be logged into the EMS error logs as logging requires connection to the registrar, which isn't started. For psmctd, an error will be logged into the System Activity log indicating psmctd exited due to an initialization error below:Wed Feb 27 15:29:12 2002: Daemon process (psmctd) with process identifier (26984) exited. Wed Feb 27 15:29:12 2002: Daemon process completed with exit_status SYS_INIT_FAILED_EXIT (100) indicating the process exited because it could not perform basic initialization. Possible Causes/Recommended Action: Process internal error. For a remap hardware process, check the map log for more information. Wed Feb 27 15:29:12 2002: Daemon process (psmctd) will not be restarted as restart attempts have exceeded the maximum allowed (5). Start daemon process manually using User Interface.
CAUTION: UPS Monitor May Need a PatchIn some cases, the UPS monitor (dm_ups) will not function and will instead generate event 45 (formerly event 42) with the text:
Probable Cause / Recommended Action: The monitor was unable to locate the fifo pipe that should have been created by ups_mond. Therefore, information about the ups cannot be sent to the monitor. You need version (80.1.2.3) of ups_mond or greater. To update your system with the correct version of ups_mond, install one of the following patches: HPUX 10.20/s800 : PHCO_24153 (supersedes PHCO_23830) HPUX 11.00 : PHCO_24172 (supersedes PHCO_23831) HPUX 11.11 : PHCO_23832To fix the problem, load the indicated patch or load the HWE patch bundle which contains this patch. For HP-UX 11i, the ups_mond patch PHCO_23832 is also distributed on the Sept 01 OE.This problem will affect most systems with a UPS when the September 2001 diagnostics are installed. The only systems not affected will be those which are being updating from certain versions of the diagnostics (September 2000 through March 2001) and which do not have patch PHCO_19031 (HP-UX 10.20) or PHCO_19040 (HP-UX 11.00) installed.
CAUTION: Monitoring Changes for disc30, sdisk and disk array devicesAs of IPR 9902 (Feb 99 release), there has been a change to the way that monitoring is done for disc30, sdisk and the HA Disk Array Models 10, 20, and 30FC.
Formerly, the "diaglogd exec" programs (pdisc30_exec, pharaymon_exec, and psdisk_exec) handled driver error entries for these devices.
As of IPR 9902, these programs have been deleted and their functionality is now provided by the EMS Hardware Monitors.
If you had customized the configuration files for the diaglogd exec programs (disk30_exec.cfg, sdisk_exec.cfg, and haraymon_exec.cfg) you may wish to re-configure the EMS Hardware Monitors to achieve the same results.
CAUTION: Compatibility Problem with EMS-Related Products (ServiceGuard, HA Monitors, etc.)If you install the OnlineDiag bundle (Dec 99 or later) onto a computer running older revisions of EMS-related products, these products may experience compatibility problems. Affected products include MC/ServiceGuard, ServiceGuard OPS Edition and High Availability Monitors. The only critical problems occur with the following versions:
MC/ServiceGuard A.10.10, A.11.01, A.11.03 ServiceGuard OPS Edition A.11.02, A.11.03Support Tools and the EMS hardware monitors are not affected. For complete information, see EMS Incompatibility Problem.
Monitors are provided to support the following:
- AutoRAID Disk Array (armmon)
- Chassis Code Monitor (dm_chassis)
- CMC Monitor (cmc_em).
- Core Hardware (dm_core_hw)
- Core Hardware for Itanium (ia64_corehw)
- CPU (lpmc_em)
- Disk (disk_em)
- Disk Array FC60 (fc60mon)
- Fast Wide SCSI Disk Array (fw_disk_array)
- Fibre Channel Adapters (dm_FCMS_adapter)
- Fibre Channel Adapter Model A5158 (dm_TL_adapter)
- Fibre Channel Arbitrated Loop Hub (dm_fc_hub)
- Fibre Channel SCSI Multiplexer (dm_fc_scsi_mux)
- Fibre Channel Switch (dm_fc_sw)
- High Availability Disk Array (ha_disk_array)
- High Availability Storage System (dm_ses_enclosure)
- Kernel Resource (krmond)
- LPMC (lpmc_em) renamed to "CPU Monitor" as of the June 02 release
- Memory (dm_memory)
- Remote (RemoteMonitor)
- SCSI Card (scsi123_em)
- SCSI Cascade (scsi_cascade)
- SCSI Disk (scsi_disk)
- SCSI Tape Devices (dm_stape)
- System Status (sysstat_em)
- UPS (dm_ups)
In addition, the Peripheral Status Monitor (PSM) is provided to monitor the current status of the products supported by the above list.
For detailed information concerning which products are supported by which monitors and additional dependencies, check the "Diagnostics" section of Hewlett-Packard's online documentation web site: http://docs.hp.com/hpux/diag/ .
Several of the monitors have special requirements, such as patches or certain versions of firmware. In particular:
For a list of the current required patches, see the DIAGNOSTIC.readme file for this release.
- The Fibre Channel Arbitrated Loop Hub Monitor and the Fibre Channel Switch Monitor require special configuration which is described in their data sheets in the "EMS Hardware Monitors User's Guide" (chapter 6). A patch is also required.
- A patch is required if your system includes an HP SureStore E Disk Array FC60. This patch is required to to run the EMS hardware monitor (fc60mon) or STM tools for this device.
Current monitor requirements are described in the "Supported Products" page under "EMS Hardware Monitors" at http://docs.hp.com/hpux/diag . Requirements are also listed in chapter 2 of the manual "EMS Hardware Monitors User's Guide".
Use CHART to report defects in the EMS Hardware monitors. The project name is diag.hw_mon.hpux. If you don't have access to CHART, contact an HP representative to enter a defect for you.
The EMS hardware monitors are installed as part of the OnlineDiag bundle (product number B4708AA). In addition, they utilize the EMS framework, product number B7609BA.
Note: EMS Hardware Monitors are installed as part of the STM-UUT-RUN Fileset. However, the EMS Hardware Monitors are dependent on the EMS-Core and EMS-Config products and additional filesets in the Sup-Tool-Mgr Product.
For information on the STM product, refer to the STM release notes file /usr/sbin/stm/Rel_NOTES.STM.
SD Bundle: OnlineDiag Description: On-line Diagnostic System (Series 800/700) SD PRODUCT: Sup-Tool-Mgr Description: Support Tools Manager for HP-UX Systems SD SUB-PRODUCT: Manuals Description: Support Tools Manager Manual Pages FILESET: RELEASE_NOTES Description: HPUX STM Release Notes FILESET: STM-MAN Description: HPUX STM Manual Pages SD SUB-PRODUCT: Runtime Description: STM Manual Runtime FILESET: STM-CATALOGS Description: HPUX STM Shared Libraries FILESET: STM-SHLIBS Description: HPUX STM Shared Libraries FILESET: STM-UI-RUN Description: HPUX STM User Interface FILESET: STM-UUT-RUN Description: HPUX STM Unit Under Test Runtime SD PRODUCT: EMS-Config Description: EMS Config FILESET: EMS-GUI Description: Event Monitoring Service Graphical User Interface SD PRODUCT: EMS-Core Description: EMS Core Product FILESET: EMS-CORE Description: Event Monitoring Service Core Files