Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant High Availability Clusters: > Chapter 4 Designing a Continental Cluster

Building the Continentalclusters Configuration

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

If necessary, use the swinstall command to install the Continentalclusters product on all nodes in both clusters. Then create the Continentalclusters configuration using the following steps (each step is described in detail in the sections that follow):

  • Prepare the security files.

  • Create the monitor package on each cluster containing a recovery package. Clusters not containing a recovery package may also monitor the other cluster by creating a monitor package on that cluster.

  • Edit the Continentalclusters configuration file on a node of your choice in either cluster.

  • Check and apply the Continentalclusters configuration.

  • Start each Continentalclusters monitor package on it’s cluster.

  • Validate the configuration.

  • Document the recovery procedure and distribute the documentation to both sites. Make sure all personnel are familiar with these procedures.

  • Test recovery procedures.

Preparing Security Files

Running a Continentalclusters command requires root access to cluster information on all the nodes in both clusters in the configuration. Before doing the Continentalclusters configuration, edit the /etc/cmcluster/cmclnodelist file on all the nodes of both clusters to include entries that will allow access by all nodes in the Continentalclusters. Here is a sample entry in the /etc/cmcluster/cmclnodelist file for a continental cluster configured with two, two-node Serviceguard clusters:

lanode1.myco.com    root
lanode2.myco.com    root
nynode1.myco.com    root
nynode2.myco.com    root

You must also create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running monitor packages and Continentalclusters commands to obtain information from other nodes about the health of each cluster. The file must contain entries that allow access to all nodes in the continental cluster by the nodes where monitors and Continentalclusters commands are running.

You define the order of security checking by creating entries of the following types:

order deny,allow

If deny is first, the deny list is checked first to see if the node is there, then the allow list is checked.

deny from

lists all the nodes that are denied access. Permissible entries are:

all

All hosts are denied access.

domain

Hosts whose names match, or end in, this string are allowed access, e.g. hp.com.

hostname

The named host (for example, kitcat.myco.com) is denied access.

IP address

Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet restriction is allowed.

network/netmask

This pair of addresses allows more precise restriction of hosts, (e.g. 10.163.121.23/225.225.0.0).

network/nnnCIDR

This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for classless interdomain routing, a type of routing supported by the Border Gateway protocol (BGP).

allow from

lists all the nodes that are allowed access. Permissible entries are:

all

All hosts are allowed access.

domain

Hosts whose names match, or end in, this string are allowed access, e.g. hp.com.

hostname

The named host (for example, kitcat.myco.com) is allowed access.

IP address

Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet inclusion is allowed.

network/netmask

This pair of addresses allows more precise inclusion of hosts, (e.g. 10.163.121.23/225.225.0.0).

network/nnnCIDR

This specification is like the network/netmask specification, except the netmask consists of nnn high-order 1 bits. “CIDR” stands for classless interdomain routing, a type of routing supported by the Border Gateway protocol (BGP).

The most typical entry is hostname. The following entries are from a typical /etc/opt/cmom/cmomhosts file:

order allow,deny
allow from lanode1.myco.com
allow from lanode2.myco.com
allow from nynode1.myco.com
allow from nynode2.myco.com
allow from 10.177.242.12

If the file is installed on all nodes in the continental cluster, these entries will allow Continentalclusters commands and monitors running on lanode1, lanode2, nynode1, nynode2 to obtain information about the clusters in the configuration.

Creating the Monitor Package

The Continentalclusters monitoring software is configured as an Serviceguard package so that it remains highly available. The following steps should be carried out on the recovery cluster and repeated on the primary cluster if you want to monitor the recovery site from the primary site:

  1. On the node where you are doing the configuration, create a directory for the monitor package:

    # mkdir /etc/cmcluster/ccmonpkg

  2. Copy the template files from the /opt/cmconcl/scripts directory to the /etc/cmcluster/ccmonpkg directory:

    # cp /opt/cmconcl/scripts/ccmonpkg.* \ /etc/cmcluster/ccmonpkg

    • ccmonpkg.config is the ASCII package configuration file template for the Continentalclusters monitoring application.

    • ccmonpkg.cntl is the control script file for the Continentalclusters monitoring application.

      NOTE: It is not recommended that you edit ccmonpkg.cntlfile. However, you may change the default SERVICE_RESTART value “-r 3” to a value that fits your environment.
  3. Edit the package configuration file (suggested name of /etc/cmcluster/ccmonpkg/ccmonpkg.config) to match the cluster configuration:

    1. Add the names of all nodes in the cluster on which the monitor may run.

    2. AUTO_RUN(PKG_SWITCHING_ENABLED used prior to Serviceguard 11.12) should be set to YES so that the monitor package will fail over between local nodes. (Note, however, that for all primary and recovery packages, AUTO_RUN is always set to NO.)

  4. Use the cmcheckconf command to validate the package:

    # cmcheckconf -P ccmonpkg.config

  5. Copy the package configuration file ccmonpkg.config and control script ccmonpkg.cntl to the monitor package directory (default name /etc/cmcluster/ccmonpkg) on all the other nodes in the cluster. Make sure this file is executable.

  6. Use the cmapplyconf command to add the package to the Serviceguard configuration:

    # cmapplyconf -P ccmonpkg.config

The following sample package configuration file (comments have been left out) shows a typical package configuration for a Continentalclusters monitor package:

#PACKAGE_NAME                ccmonpkg
FAILOVER_POLICY             CONFIGURED_NODE
FAILBACK_POLICY             MANUAL
NODE_NAME                   LAnode1
NODE_NAME                   LAnode2 
RUN_SCRIPT                  /etc/cmcluster/ccmonpkg/ccmonpkg.cntl
RUN_SCRIPT_TIMEOUT          NO_TIMEOUT
HALT_SCRIPT                 /etc/cmcluster/ccmonpkg/ccmonpkg.cntl
HALT_SCRIPT_TIMEOUT         NO_TIMEOUT
SERVICE_NAME                ccmonpkg.srv
SERVICE_FAIL_FAST_ENABLED   NO
SERVICE_HALT_TIMEOUT        300
AUTO_RUN                    YES
NET_SWITCHING_ENABLED       YES
NODE_FAIL_FAST_ENABLED       NO


WARNING! Do not run a monitor package until you complete the steps for “Checking and Applying the Continentalclusters Configuration”.

Editing the Continentalclusters Configuration File

First, on one cluster, generate an ASCII configuration template file using the cmqueryconcl command. The recommended name and location for this file is /etc/cmcluster/cmconcl.config. (You can choose a different name if you wish.) Example:

# cd /etc/cmcluster

# cmqueryconcl -C cmconcl.config

This file has three editable sections:

  • Cluster information

  • Recovery groups

  • Monitoring definitions

Customize each section according to your needs. The following are some guidelines for editing each section.

Editing Section 1—Cluster Information

Enter cluster-level information as follows in this section of the file:

  1. Enter a name for the continental cluster on the line that contains the CONTINENTAL_CLUSTER_NAME keyword. This can be any name you choose, but it cannot be changed after the configuration is applied. To change the name, you must first delete the existing configuration as described in “Renaming a Continental Cluster”.

  2. Enter the name of the first cluster after the first CLUSTER_NAME keyword followed by the names of all the nodes within the first cluster. Use a separate NODE_NAME keyword and HP-UX host name for each node.

  3. Enter the domain name of the cluster’s nodes following the DOMAIN_NAME keyword.

  4. Optionally, enter the name of the monitor package on the first cluster after the MONITOR_PACKAGE_NAME keyword and the interval at which monitoring by this package will take place (minutes and/or seconds) following the MONITOR_INTERVAL keyword.

    The monitor interval defines how long it can take for Continentalclusters to detect that a cluster is in a certain state. The default interval is 60 seconds, but the optimal setting depends on your system’s performance. Setting this interval too low can result in the monitor’s falsely reporting an Unreachable or Error state. If you observe this during testing, use a larger value.

    It is suggested that you use the name “ccmonpkg” for all Continentalclusters monitors. Create this package on each cluster containing a recovery package. If you do not wish to monitor a cluster, not containing a recovery package, you must delete or comment out the MONITOR_PACKAGE_NAME line and the MONITOR_INTERVAL line. For mutual recovery, create the monitor package on both the first and second clusters.

    NOTE: Monitoring of a cluster not containing recovery packages is optional. For example, you might set up monitoring of such a cluster so you can check the status of the data replication technology being used.
  5. Repeat steps 2 through 4 for the alternate cluster.

NOTE: The monitor package is sensitive to system time and date. If you change the system time or date either backwards or forwards on the node where the monitor is running, notifications of alerts and alarms may be sent at incorrect times.

A printout of Section 1 of the Continentalclusters ASCII configuration file follows.

###############################################################################
#### ####
#### CONTINENTAL CLUSTER CONFIGURATION FILE ####
#### ####
#### ####
#### This file contains Continentalclusters configuration data. ####
#### The file is divided into three sections, as follows: ####
#### ####
#### 1. Cluster Information ####
#### 2. Recovery Groups ####
#### 3. Events, Alerts, Alarms, and Notifications ####
#### ####
#### For complete details about how to set the parameters in ####
#### this file, consult the cmqueryconcl(1m) manpage or your manual. ####
#### ####
###############################################################################
#### ####
#### Section 1. Cluster Information ####
#### ####
#### This section contains the name of the continental cluster ####
#### followed by the names of member clusters and all their nodes. ####
#### The continental cluster name can be any string you choose, up ####
#### to 40 characters in length. Each member cluster name must be ####
#### the same as it appears in the Serviceguard cluster config- ####
#### uration ASCII file for that cluster. In addition to the cluster  ####
#### name, include a domain name for the nodes in the cluster. Node    ####
####     names must be the same as those that appear in the cluster        ####
####     configuration ASCII file.                                         ####
#### ####
#### In the space below, enter the continental cluster name, ####
#### then enter a cluster name and domain for each member cluster,     ####
#### and the names of all the nodes in that cluster. Following        ####
#### the node names, enter the name of a monitor package ####
#### that will run the continental cluster monitoring software ####
#### on that cluster. It is strongly recommended that you use the ####
#### same name for the monitoring package on all clusters; ####
#### "ccmonpkg" is suggested.  Monitoring of the recovery cluster      ####
#### by the primary cluster is optional. If you do not wish to ####
#### monitor the recovery cluster, you must delete or comment out the  ####
####     MONITOR_PACKAGE_NAME and MONITOR_INTERVAL lines that follow the   ####
####     name of the primary cluster.                                      ####
#### ####
#### After the monitor package name, enter a monitor interval, ####
#### specifying a number of minutes and/or seconds. The default is 60 ####
#### seconds, the minimum is 30 seconds, and the maximum is 5 minutes. ####
#### ####
#### Example: ####
#### ####
#### CONTINENTAL_CLUSTER_NAME ccluster1 ####
#### ####
#### CLUSTER_NAME westcoast ####
####            CLUSTER_DOMAIN          westnet.myco.com                   ####
#### NODE_NAME system1 ####
#### NODE_NAME system2 ####
#### MONITOR_PACKAGE_NAME ccmonpkg ####
#### MONITOR_INTERVAL 1 MINUTE 30 SECONDS                ####
#### ####
#### CLUSTER_NAME eastcoast ####
####            CLUSTER_DOMAIN          eastnet.myco.com                   ####
#### NODE_NAME system3 ####
#### NODE_NAME system4 ####
#### MONITOR_PACKAGE_NAME ccmonpkg ####
#### MONITOR_INTERVAL 1 MINUTE 30 SECONDS                ####
#### ####

CONTINENTAL_CLUSTER_NAME ccluster1

CLUSTER_NAME
        CLUSTER_DOMAIN
NODE_NAME
NODE_NAME
MONITOR_PACKAGE_NAME ccmonpkg
MONITOR_INTERVAL 60 SECONDS

CLUSTER_NAME
        CLUSTER_DOMAIN
NODE_NAME
NODE_NAME
MONITOR_PACKAGE_NAME ccmonpkg
MONITOR_INTERVAL 60 SECONDS


Editing Section 2—Recovery Groups

In this section of the file, you define recovery groups, which are sets of Serviceguard packages that are ready to recover applications in case of cluster failure. You create a separate recovery group for each package that will be started on a cluster when the cmrecovercl(1m) command is issued on that cluster.

Examples of recovery groups are shown graphically in Figure 4-6 “Sample Continentalclusters Recovery Groups” and Figure 4-7 “Sample Bi-directional Recovery Groups”.

Figure 4-6 Sample Continentalclusters Recovery Groups

Sample Continentalclusters Recovery Groups

Figure 4-7 Sample Bi-directional Recovery Groups

Sample Bi-directional Recovery Groups

Enter data in Section 2 as follows:

  1. Enter a name for the recovery group following the RECOVERY_GROUP_NAME keyword. This can be any name you choose.

  2. After the PRIMARY_PACKAGE keyword, enter a primary package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example:

    PRIMARY_PACKAGE LAcluster/custpkg

  3. Optionally, enter a data sender package definition consisting of the cluster name, a slash (/), and the data sender package name after the DATA_SENDER_PACKAGE keyword. This is only necessary if you are using a logical data replication method that requires a data sender package.

  4. After the RECOVERY_PACKAGE keyword, enter a recovery package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example:

    RECOVERY_PACKAGE NYcluster/custpkg_bak

  5. Optionally, enter a data receiver package definition consisting of the cluster name, a slash (/), and the data receiver package name after the DATA_RECEIVER_PACKAGE keyword. This is only necessary if you are using a logical data replication method that requires a data receiver package.

  6. Repeat these steps for each package that will be recovered. Each package must be configured in a separate recovery group.

A printout of Section 2 of the Continentalclusters ASCII configuration file follows.

###############################################################################
#### ####
#### Section 2. Recovery Groups ####
#### ####
#### This section defines recovery groups--sets of Serviceguard ####
#### packages that are ready to recover applications in case of ####
#### cluster failure. Recovery groups allow one cluster in the ####
#### continental cluster configuration to back up another member ####
#### cluster’s packages. You create a separate recovery group          ####
#### for each Serviceguard package that will be started on the ####
#### recovery cluster when the cmrecovercl(1m) command is issued. ####
#### ####
#### A recovery group consists of a primary package running on ####
#### one cluster, and a recovery package that is ready to run on a     ####
#### different cluster. In some cases, a data receiver package runs ####
#### on the same cluster as the recovery package, and in some cases, ####
#### a data sender package runs on the same cluster as the primary ####
#### package. ####
#### ####
#### During normal operation, the primary package is running an ####
#### application program on the primary cluster, and the recovery ####
#### package, which is configured to run the same application, is ####
#### idle on the recovery cluster. If the primary package performs ####
#### disk I/O, the data that is written to disk is replicated ####
#### and made available for possible use on the recovery cluster. ####
#### For some data replication techniques, this involves the use of ####
#### a data receiver package running on the recovery cluster. ####
#### In the event of a major failure on the primary cluster, the ####
#### user issues the cmrecovercl(1m) command to halt any data ####
#### receiver packages and start up all the recovery packages ####
#### that exist on the recovery cluster. ####
#### ####
#### Enter the name of each package recovery group together with ####
#### the fully qualified names of the primary and recovery ####
#### packages. If appropriate, enter the fully qualified name ####
#### of a data receiver package. Note that the data receiver ####
#### package must be on the same cluster as the recovery package. ####
#### ####
#### The primary package name includes the primary cluster name ####
#### followed by a slash ("/") followed by the package name on ####
#### the primary cluster. The recovery package name includes ####
#### the recovery cluster name, followed by a slash ("/") ####
#### followed by the package name on the recovery cluster. ####
#### The data receiver package name includes the recovery cluster ####
#### name, followed by a slash ("/") followed by the name of ####
#### the data receiver package on the recovery cluster. ####
#### ####
#### Up to 29 recovery groups can be entered. ####
#### ####
#### Example: ####
#### ####
#### RECOVERY_GROUP_NAME nfsgroup ####
#### PRIMARY_PACKAGE westcoast/nfspkg ####
#### DATA_SENDER_PACKAGE westcoast/nfssenderpkg ####
#### RECOVERY_PACKAGE eastcoast/nfsbackuppkg ####
#### DATA_RECEIVER_PACKAGE eastcoast/nfsreceiverpkg ####
#### ####
#### RECOVERY_GROUP_NAME hpgroup ####
#### PRIMARY_PACKAGE westcoast/hppkg ####
#### DATA_SENDER_PACKAGE westcoast/hpsenderpkg ####
#### RECOVERY_PACKAGE eastcoast/hpbackuppkg ####
#### DATA_RECEIVER_PACKAGE eastcoast/hpreceiverpkg ####
#### ####

RECOVERY_GROUP_NAME
PRIMARY_PACKAGE
# DATA_SENDER_PACKAGE
RECOVERY_PACKAGE
# DATA_RECEIVER_PACKAGE

Editing Section 3—Monitoring Definitions

Finally, you enter monitoring definitions, which define cluster events and set times at which alert and alarm notifications are to be sent out. Define notifications for all cluster events—Unreachable, Down, Up, and Error.

Although it is impossible to make specific recommendations for every Continentalclusters environment, here are a few general guidelines about notifications.

  1. Specify the cluster event by using the CLUSTER_EVENT keyword followed by the name of the cluster, a slash (“/”) and the name of the status—Unreachable, Down, Up, or Error. Example:

    CLUSTER_EVENT LAcluster/UNREACHABLE

  2. Define a CLUSTER_ALERT at appropriate times following the appearance of the event. Specify the elapsed time and include a NOTIFICATION message that provides useful information about the event. You can create as many alerts as needed, and you can send as many notifications as you wish to different destinations (see the comments in the file excerpt below for a list of destination types). Note that the message text in the notification must be on a separate line in the file.

  3. If the event is for a cluster in an Unreachable condition, define a CLUSTER_ALARM at appropriate times. Specify the elapsed time since the appearance of the event (greater than the time used for the last CLUSTER_ALERT), and include a NOTIFICATION message that indicates what action should be taken. You can create as many alarms as needed, and you can send as many notifications as you wish to different destinations (see the comments in the file excerpt below for a list of destination types).

  4. If you are using a monitor on a cluster containing no recovery packages, define alerts for the monitoring of Up, Down, Unreachable, and Error states on the recovery cluster. It is not necessary to define alarms.

A printout of Section 3 of the Continentalclusters ASCII configuration file follows.

###############################################################################
#### ####
#### Section 3. Monitoring Definitions ####
#### ####
#### This section of the file contains monitoring definitions. ####
#### Well planned monitoring definitions will help in making the ####
#### decision whether or not to issue the cmrecovercl(1m) command. ####
#### Each monitoring definition specifies a cluster event along with ####
#### the messages that should be sent to system administrators ####
#### or other IT staff. All messages are appended to the default log ####
#### /var/adm/cmconcl/eventlog as well as being sent to the ####
#### destination you specify below. ####
#### ####
#### A cluster event takes place when a monitor that is located on ####
#### one cluster detects a significant change in the condition ####
#### of another cluster. The monitored cluster conditions are: ####
#### ####
####     UNREACHABLE - the cluster is unreachable. This will               ####
#### occur when the communication link to the ####
#### cluster has gone down, as in a WAN failure, ####
#### or when the all nodes in the cluster have ####
#### failed. ####
#### ####
#### DOWN - the cluster is down but nodes are responding. ####
#### This will occur when the cluster is halted, ####
#### but some or all of the member nodes are booted ####
#### and communicating with the monitoring cluster. ####
#### ####
#### UP - the cluster is up. ####
#### ####
#### ERROR - there is a mismatch of cluster versions or ####
#### a security error. ####
#### ####
#### A change from one of these conditions to another one is a ####
#### cluster event. You can define alert or alarm states based on the ####
#### length of time since the cluster event was observed. Some events ####
#### are noteworthy at the time they occur, and some are noteworthy ####
#### when they persist over time. Setting the elapsed time to zero ####
#### results in a message being sent as soon as the event takes place. ####
#### Setting the elaspsed time to 5 minutes results in a message ####
#### being sent when the condition has persisted for 5 minutes. ####
#### ####
#### An alert is intended as informational only. Alerts may be sent ####
#### for any type of cluster condition. For an alert, a notification ####
#### is sent to a system administrator or other destination. Alerts ####
#### are not intended to indicate the need for recovery. The ####
#### cmrecovercl(1m) command is disabled. ####
#### ####
#### An alarm is an indication that a condition exists that may ####
#### require recovery. For an alarm, a notification is sent, and ####
#### in addition, the cmrecovercl(1m) command is enabled for immediate ####
#### execution, allowing the administrator to carry out cluster ####
#### recovery. An alarm can only be defined for an UNREACHABLE or     ####
####     DOWN condition in the monitored cluster.                          ####
#### ####
#### A notification defines a message that is appended to the ####
#### log file /var/adm/cmconcl/eventlog and sent to other ####
#### specified destinations, including email addresses, SNMP traps, ####
#### the system console, or the syslog file. The message string in ####
####     a notification is entered in double quotes on a separate line;    ####
####     it can be no more than 170 characters long. Enter notifications   ####
####     in one of the following forms:                                    ####
#### ####
####         NOTIFICATION CONSOLE                                          ####
####             <message>                                                 ####
#### Message written to the console. ####
#### ####
####         NOTIFICATION EMAIL   <address>                                ####
####             <message>                                                 ####
#### Message emailed to a fully ####
#### qualified email address. ####
#### ####
#### NOTIFICATION OPC <level>                                    ####
####             <message>                                                 ####
#### The message is sent to   ####
#### OpenView IT/Operations). ####
#### The value of <level> may be 8 (normal), ####
#### 16 (warning), 64 (minor), 128 (major), ####
#### 32 (critical). ####
#### ####
#### NOTIFICATION SNMP <level>                                    ####
####             <message>                                                 ####
#### The message is sent as an SNMP trap.   ####
#### The value of <level> may be 1 (normal), ####
#### 2 (warning), 3 (minor), 4 (major), ####
#### 5 (critical). ####
#### ####
#### NOTIFICATION SYSLOG                                           ####
####             <message>                                                 ####
#### A notice of the event is appended to the ####
#### syslog file. ####
#### ####
#### NOTIFICATION TCP <nodename>:<portnumber>                    ####
####             <message>                                                 ####
#### Message is sent to a TCP port on the ####
#### specified node. ####
#### ####
#### NOTIFICATION TEXTLOG <pathname>                              ####
####             <message>                                                 ####
#### A notice of the event is written to a user- ####
#### specified log file. <pathname> must be a full ####
#### path for the user-specified file. ####
#### ####
#### NOTIFICATION UDP <nodename>:<portnumber>                    ####
####             <message>                                                 ####
#### Message is sent to a UDP port on the ####
#### specified node. ####
#### ####
#### For the cluster event, enter a cluster name followed by ####
#### a slash ("/") and a cluster condition (UP, DOWN, UNREACHABLE,     ####
#### ERROR) that may be detected by a monitor program. ####
#### ####
#### Each cluster event must be paired with a monitoring cluster. ####
#### Include the name of the cluster on which the monitoring ####
#### will take place. Events can be monitored from either the ####
#### primary cluster or the recovery cluster. ####
#### ####
#### Alerts, alarms, and notifications have the following syntax. ####
#### ####
#### CLUSTER_ALERT <min> MINUTES <sec> SECONDS ####
#### Delay before the software issues ####
#### an alert notification about the ####
#### cluster event. ####
#### ####
#### CLUSTER_ALARM <min> MINUTES <sec> SECONDS ####
#### Delay before the software issues ####
#### an alarm notification about the ####
#### cluster event and enables the cmrecovercl(1m) ####
#### command for immediate execution. ####
#### ####
####       NOTIFICATION   <type>                                           ####
####          <message>                                                    ####
#### A string value which is sent from the ####
#### monitoring cluster for a given event ####
#### to a specified destination. The <message>, ####
#### which can be no more than 170 characters, ####
####                         is also appended to the                       ####
####                         /var/adm/cmconcl/eventlog                ####
#### file on the monitoring node in the cluster ####
#### where the event was detected. ####
#### ####
#### Example: ####
#### ####
#### CLUSTER_EVENT                  westcoast/UNREACHABLE              ####
#### MONITORING_CLUSTER eastcoast ####
#### CLUSTER_ALERT 5 MINUTES ####
#### NOTIFICATION EMAIL admin@primary.site ####
#### "westcoast unreachable for 5 min. Call secondary site."      ####
#### NOTIFICATION EMAIL admin@secondary.site ####
#### "Call primary admin. (555) 555-6666." ####
#### ####
#### CLUSTER_ALERT 10 MINUTES ####
#### NOTIFICATION EMAIL admin@primary.site ####
#### "westcoast unreachable for 10 min. Call secondary site."     ####
#### NOTIFICATION EMAIL admin@secondary.site ####
#### "Call primary admin. (555) 555-6666." ####
#### NOTIFICATION CONSOLE ####
#### "Cluster ALERT: westcoast not responding." ####
#### ####
#### CLUSTER_ALARM 15 MINUTES ####
#### NOTIFICATION EMAIL admin@primary.site ####
#### "westcoast unreachable for 15 min. Takeover advised."        ####
#### NOTIFICATION EMAIL admin@secondary.site ####
#### "westcoast still not responding. Use cmrecovercl command." ####
#### NOTIFICATION CONSOLE ####
#### "Cluster ALARM: Issue cmrecovercl command to take over westcoast."  ####
#### ####
#### CLUSTER_EVENT westcoast/UP ####
#### MONITORING_CLUSTER eastcoast ####
#### CLUSTER_ALERT 0 MINUTES ####
#### NOTIFICATION EMAIL admin@secondary.site ####
#### "Cluster westcoast is up." ####
#### ####
#### CLUSTER_EVENT westcoast/DOWN ####
#### MONITORING_CLUSTER eastcoast ####
#### CLUSTER_ALERT 0 MINUTES ####
#### NOTIFICATION EMAIL admin@secondary.site ####
#### "Cluster westcoast is down." ####
#### ####
#### CLUSTER_EVENT westcoast/ERROR ####
#### MONITORING_CLUSTER eastcoast ####
#### CLUSTER_ALERT 0 MINUTES ####
#### NOTIFICATION EMAIL admin@secondary.site ####
#### "Error in monitoring cluster westcoast." ####
#### ####

CLUSTER_EVENT <cluster_name>/UNREACHABLE
MONITORING_CLUSTER
CLUSTER_ALERT
NOTIFICATION
NOTIFICATION
CLUSTER_ALERT
NOTIFICATION
NOTIFICATION
CLUSTER_ALARM
NOTIFICATION
NOTIFICATION

CLUSTER_EVENT <cluster_name>/DOWN
MONITORING_CLUSTER
CLUSTER_ALERT
NOTIFICATION
NOTIFICATION
CLUSTER_ALERT
NOTIFICATION
NOTIFICATION
CLUSTER_ALARM
NOTIFICATION
NOTIFICATION

CLUSTER_EVENT <cluster_name>/UP
MONITORING_CLUSTER
CLUSTER_ALERT
NOTIFICATION

CLUSTER_EVENT <cluster_name>/ERROR
MONITORING_CLUSTER
CLUSTER_ALERT
NOTIFICATION

Selecting Notification Intervals

The monitor interval determines the amount of time between distinct attempts by the monitor to obtain the status of a cluster. The intervals associated with notifications need to be chosen to work in combination with the monitor interval to give a realistic picture of cluster events. Some combinations are not useful. For example, notification intervals that are smaller than the monitor interval do not make sense, and should be avoided. In the following example, the cluster event will always result in two alerts followed by an alarm. No change of state could possibly be detected at the one-minute, two-minute and three-minute intervals, because the monitor does not check for changes until the monitor interval ( 5 minutes) has been reached.

MONITOR_PACKAGE_NAME ccmonpkg
MONITOR_INTERVAL 5 MINUTES
...
CLUSTER_EVENT LACluster/UNREACHABLE
CLUSTER_ALERT 1 MINUTE
NOTIFICATION CONSOLE
"1 Minute Alert: LACluster Unreachable"
CLUSTER_ALERT 2 MINUTES
NOTIFICATION CONSOLE
"2 Minute Alert: LACluster Still Unreachable"
CLUSTER_ALARM 3 MINUTES
NOTIFICATION CONSOLE
"ALARM: LACluster Unreachable after 3 Minutes: Recovery Enabled"

The following sequence could provide meaningful notifications, since a change of state is possible between notification intervals:

MONITOR_PACKAGE_NAME ccmonpkg 
MONITOR_INTERVAL 1 MINUTE
...
CLUSTER_EVENT LACluster/UNREACHABLE
CLUSTER_ALERT 3 MINUTES
NOTIFICATION CONSOLE
"3 Minute Alert: LACluster Unreachable"
CLUSTER_ALERT 5 MINUTES
NOTIFICATION CONSOLE
"5 Minute Alert: LACluster Still Unreachable"
CLUSTER_ALARM 10 MINUTES
NOTIFICATION CONSOLE
"ALARM: LACluster Unreachable after 10 Minutes: Recovery Enabled"

A rule of thumb is that the notification intervals should be multiples of the monitor interval.

Checking and Applying the Continentalclusters Configuration

After editing the configuration file on the primary cluster, halt any monitor packages that are running, then use the following steps to apply the configuration to all nodes in the continental cluster.

  1. Use the following command to verify the content of the file:

    # cmcheckconcl -v -C cmconcl.config

    This command will verify that all parameters are within range, all fields are filled out, and the entries (such as NODE_NAME) are valid.

  2. Use the following command to distribute the Continentalclusters configuration information to all nodes in the continental cluster:

    # cmapplyconcl -v -C cmconcl.config

    Configuration data is copied to all nodes and in both clusters. This data includes a set of managed object files that are copied to the /etc/cmconcl/instances directory on every node in both clusters. All nodes must be booted when the command is issued, although the Serviceguard cluster may or may not be running.

  3. Be sure to make a backup copy of the configuration ascii file and save it on the other cluster after it is applied.

NOTE: If any problems occur during the execution of cmapplyconcl, you can repeat the command as often as necessary. Issuing the command will delete the existing Continentalclusters configuration and apply the new one.

When configuration is finished, your systems should have sets of files similar to those shown in Figure 4-8 “Continentalclusters Configuration Files”.

Figure 4-8 Continentalclusters Configuration Files

Continentalclusters Configuration Files

Starting the Continentalclusters Monitor Package

Starting the monitoring package enables all Continentalclusters functionality. Before you do this, ensure that the primary packages you wish to protect are running normally and that data sender and receiver packages, if they are being used for logical data replication, are working correctly.

If you are using physical data replication, make sure that it is operational.

On each monitoring cluster use the following command to start the monitor package:

# cmmodpkg -e ccmonpkg

After the monitor package is started, a log file /var/adm/cmconcl/sentryd.log will be created on the node where the package is running to record the Continentalclusters monitoring activities. It is recommended that this log file be archived or cleaned up periodically.

Validating the Configuration

The following table shows the status of Continentalclusters packages when each cluster is running normally and no recovery has taken place.

Table 4-5 Status of Continentalclusters Packages Before Recovery

Primary ClusterRecovery Cluster

Data Replication Method

Primary PackageData Sender Package

Optional Monitor Package

Recovery Package

Data Receiver Package

Required Monitor Package

Physical— Symmetrix

RunningNot used

Running (optional)

Halted

Not used

Running (required)

Physical— XP Series

RunningNot usedRunning (optional)HaltedNot usedRunning (required)

Logical— Oracle Standby Database

RunningNot usedRunning (optional)HaltedRunningRunning (required)

 

Use the following steps to make sure the components are functioning correctly:

  1. Use the following command to make sure all daemons are running:

    # ps -ef | grep cmcl

    Two important Continentalclusters daemons are cmclsentryd and cmclrmond.

  2. Check the cluster configuration on each cluster using the cmviewcl -v command.

    1. Ensure that each primary package is running correctly.

    2. Ensure that data sender packages (if any are used for logical data replication) are running correctly.

    3. Ensure that data receiver packages (if any are used for logical data replication) are running correctly.

    4. Ensure that the continental cluster monitor package is running correctly on each monitoring cluster.

  3. On all nodes, use the tail -f /adm/syslog/syslog.log command to check the end of the SYSLOG file for errors.

  4. On nodes where packages are running, check all package log files for errors, including application packages and the monitor package.

  5. Use the following command to verify the correct operation of the Continentalclusters daemon:

    # /opt/cmom/tools/bin/cmreadlog -f /var/adm/cmconcl/sentryd.log

  6. Make sure the Continentalclusters monitor packages (default name ccmonpkg) on each cluster fails over properly if a node fails.

  7. Change each cluster’s state to test that the monitor running on the monitoring cluster will detect the change in status and send notification.

  8. View the status of the Continentalcluster primary and recovery clusters, including configured event data:

    # cmviewconcl -v
CAUTION: You should never issue the cmrunpkg command for a recovery package when Continentalclusters is enabled, because there is no guaranteed way of preventing a package that is running on one cluster from running on the other cluster if the package is started using this command. The potential for data corruption is great.

Chapters 5 and 6 contain additional suggestions on testing the data replication and package configuration.

Documenting the Recovery Procedure

Once everything is configured and the Continentalclusters monitor is running, you must define your recovery procedure and train the administrators and operators at both sites. The checklist in Figure 4-9 “Recovery Checklist” is an example of how you might document the recovery procedure.

Figure 4-9 Recovery Checklist

Recovery Checklist

Reviewing the Recovery Procedure

Using the checklist described in the previous section, step through the recovery procedure to make sure that all necessary steps are included. If possible, create simulated failures to test the alert and alarm scenarios coded in the Continentalclusters configuration file.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.