 |
» |
|
|
 |
If necessary, use the swinstall command to install the Continentalclusters product on all nodes
in both clusters. Then create the Continentalclusters configuration
using the following steps: Prepare the security files. Create the monitor package
on each cluster containing a recovery package. Clusters not containing
a recovery package may also monitor the other cluster in the recovery
pair by creating a monitor package on that cluster. Edit the Continentalclusters
configuration file on a node of your choice in any cluster. Check and apply the Continentalclusters
configuration. Start each Continentalclusters
monitor package on it’s cluster. Validate the configuration. Document the recovery
procedure and distribute the documentation to both sites. Make sure
all personnel are familiar with these procedures. Test recovery procedures.
Preparing Security Files |  |
Running a Continentalclusters command requires
root access to cluster information on all the nodes of the participating
Serviceguard clusters in the configuration. Before doing the Continentalclusters
configuration, edit the /etc/cmcluster/cmclnodelist file on each node of all the participating clusters to
include entries that will allow access by all nodes in the Continentalclusters.
Here is a sample entry in the /etc/cmcluster/cmclnodelist file for a continental cluster configured with two, two-node
Serviceguard clusters: Also, be sure to create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running monitor
packages and Continentalclusters commands to obtain information from
other nodes about the health of each cluster. The file must contain
entries that allow access to all nodes in the continental cluster
by the nodes where monitors and Continentalclusters commands are running. Define the order of security checking by creating
entries of the following types:
- order deny,allow
If deny is first, the deny list is checked first to
see if the node is there, then the allow list is checked. - deny from
lists all the nodes that are denied access. Permissible
entries are: - all
All hosts are denied access. - domain
Hosts whose names match, or end in, this string are
denied access, for example, hp.com. - hostname
The named host (for example, kitcat.myco.com) is denied access. - IP address
Either a full IP address, or a partial IP address
of 1 to 3 bytes for subnet restriction is denied. - network/netmask
This pair of addresses allows more precise restriction
of hosts, (for example, 10.163.121.23/225.225.0.0). - network/nnnCIDR
This specification is like the network/netmask specification,
except the netmask consists of nnn high-order 1 bits. “CIDR” stands for
Classless Interdomain Routing, a type of routing supported by the
Border Gateway Protocol (BGP).
- allow from
This lists all the nodes that are allowed access.
Permissible entries are: - all
All hosts are allowed access. - domain
Hosts whose names match, or end in, this string are
allowed access, for example, hp.com. - hostname
The named host (for example, kitcat.myco.com) is allowed access. - IP address
Either a full IP address, or a partial IP address
of 1 to 3 bytes for subnet inclusion is allowed. - network/netmask
This pair of addresses allows more precise inclusion
of hosts, (for example, 10.163.121.23/225.225.0.0). - network/nnnCIDR
This specification is like the network/netmask specification,
except the netmask consists of nnn high-order 1 bits. “CIDR” stands for
Classless Interdomain Routing, a type of routing supported by the
Border Gateway Protocol (BGP).
The most typical entry is hostname. The following entries are from a typical /etc/opt/cmom/cmomhosts file: order allow,deny
allow from lanode1.myco.com
allow from lanode2.myco.com
allow from nynode1.myco.com
allow from nynode2.myco.com
allow from 10.177.242.12 |
If the file is installed on all nodes in the continental
cluster, these entries will allow Continentalclusters commands and
monitors running on lanode1, lanode2, nynode1, nynode2 to obtain
information about the clusters in the configuration. Network Security Configuration RequirementsIn a Continentalclusters configuration, if the
clusters are behind firewalls in their respective sites, you must
set appropriate firewall rules to enable inter-cluster communication.
The monitoring daemon of Continentalclusters communicates with Serviceguard
Cluster Object Manager on remote clusters. You can determine the ports
used by Cluster Object Manager from the hacl-probe entry in the /etc/services file. In the firewall
of all participating clusters, you must set the rule such that TCP
and UDP protocol traffic on the hacl-probe ports are allowed from and to the IP addresses of all nodes in the
Continentalclusters configuration. For more information on firewall
and ports, see HP Serviceguard A.11.18 Release Notes available at http://www.docs.hp.com -> High Availability. Creating the Monitor Package |  |
The Continentalclusters monitoring software is
configured as a Serviceguard package so that it remains highly available.
If more than one primary cluster is configured to share the same common
recovery cluster, such as a multiple recovery pair scenario, the monitor package running on the common recovery cluster
performs the following: monitors all of the primary
clusters sends notifications for
all of the monitored clusters events
The following steps should be carried out on the
recovery cluster and can be repeated on the primary cluster if you
want the primary cluster to monitor the recovery cluster: On the node where the
configuration is located, create a directory for the monitor package. # mkdir /etc/cmcluster/ccmonpkg Copy the template files
from the /opt/cmconcl/scripts directory to the /etc/cmcluster/ccmonpkg directory. # cp /opt/cmconcl/scripts/ccmonpkg.* \ /etc/cmcluster/ccmonpkg ccmonpkg.config is the ASCII package configuration file template
for the Continentalclusters monitoring application.
Edit the package configuration
file (suggested name of /etc/cmcluster/ccmonpkg/ccmonpkg.config) to match the cluster configuration: Add the names of all nodes
in the cluster on which the monitor may run. AUTO_RUN(PKG_SWITCHING_ENABLED
used prior to Serviceguard A.11.12) should be set to YES so that the monitor package will fail over between
local nodes. (Note, for all primary and recovery packages, AUTO_RUN is always set to NO.)
Continentalclusters provides
an optional feature for recovery groups to be in the maintenance mode.
To enable this feature, configure the monitor package with a file
system in a shared disk. For more information configuring this maintenance
mode feature, see “Configuring the Maintenance Mode Feature for Recovery Groups
in Continentalclusters”. Use the cmcheckconf command to validate the package. # cmcheckconf -P ccmonpkg.config Copy the package configuration
file ccmonpkg.config and control script ccmonpkg.cntl to the monitor package directory (default
name /etc/cmcluster/ccmonpkg) on all the other
nodes in the cluster. Make sure this file is executable. Use the cmapplyconf command to add the package to the Serviceguard configuration. # cmapplyconf -P ccmonpkg.config
The following sample package configuration file
(comments have been left out) shows a typical package configuration
for a Continentalclusters monitor package:
PACKAGE_NAME ccmonpkgPACKAGE_TYPE
FAILOVERFAILOVER_POLICY CONFIGURED_NODEFAILBACK_POLICY
MANUALNODE_NAME LAnode1
NODE_NAME LAnode2AUTO_RUN
YESLOCAL_LAN_FAILOVER_ALLOWED
YESNODE_FAIL_FAST_ENABLED
NORUN_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntl
RUN_SCRIPT_TIMEOUT NO_TIMEOUTHALT_SCRIPT
/etc/cmcluster/ccmonpkg/ccmonpkg.cntlHALT_SCRIPT_TIMEOUT
NO_TIMEOUTSERVICE_NAME
ccmonpkg.srvSERVICE_FAIL_FAST_ENABLED NO
SERVICE_HALT_TIMEOUT 300 |
Configuring the Maintenance Mode Feature for Recovery Groups
in Continentalclusters |  |
To configure the recovery group maintenance feature,
you need to configure a file system on a shared disk in all the clusters
configured in the Continentalclusters. The shared disk must have a
minimum of 250MB disk space. Specify the file system path using the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters
configuration file. Create this directory and reserve it for Continentalclusters
on all nodes in the Continentalclusters. Configure the monitor package
in the recovery clusters to mount the file system from the shared
disk. Configuring Shared
Disk for the Maintenance FeatureIdentify a shared disk connected to all nodes
at the recovery cluster where the monitor package (ccmonpkg) will run. Create a volume group with one volume on the shared
disk and complete the following procedure: Create the physical volume: pvcreate -f /dev/rdsk/c0t10d0 Create volume group directory
under the device special file namespace: mkdir /dev/ccvg Create the group special
file using the available major number: mknod /dev/ccvg/group c 64 0x060000 Create the volume group: vgcreate /dev/ccvg /dev/c0t10d0 Activate the volume group: vgchange -a y ccvg Create the logical volume: lvcreate -L 250M ccvg
Run the following command to create a file system
on the volume: mkfs vxfs /dev/ccvg/lvol1 Complete the following procedure to export the
volume group configuration and import the volume group on all the
nodes at the recovery cluster: On the node where you
created the volume, deactivate the volume group and export the VG
configuration in preview mode to a file: vgchange -a n ccvg vgexport -m /tmp/ccvg.map -p ccvg Copy the file to all the
nodes: rcp /tmp/ccvg.map node1:/tmp On each node, create the
volume group directory and the group special file: mkdir /dev/ccvg mknod /dev/ccvg/group c 64 0x060000 Import the volume group
from the map file: vgimport -m
/tmp/ccvg.map -v
Configuring a Monitor Package for the Maintenance FeatureConfigure the Continentalclusters monitor package
using the template scripts available in the /opt/cmconcl/scripts/ directory: Create the /etc/cmcluster/ccmonpkg directory on all nodes in the recovery cluster. On any node in the recovery
cluster, copy the package configuration and control file template
from the /opt/cmconcl/scripts directory to the /etc/cmcluster directory: cp /opt/cmconcl/scripts/ccmonpkg.* In the ccmonpkg.cntl monitor package control file, specify the volume group for the VG
parameter in the VOLUME GROUPS section: VG[0]="ccvg" In the ccmonpkg.cntl monitor package control file, specify a file system path and the
logical volume name under the FILE SYSTEM section. The file system path should be the value configured for
the CONTINENTAL_CLUSTER_STATE_DIR parameter in the Continentalclusters configuration file. This path
should be created and reserved on all nodes in the Continentalcluster. LV[0]=/dev/ccvg/lvol1;
FS[0]=/opt/cmconcl/statedir;
FS_MOUNT_OPT[0]="-o rw";
FS_UMOUNT_OPT[0]="";
FS_FSCK_OPT[0]="";
FS_TYPE[0]="vxfs" |
Distribute the monitor
package control file to all nodes in the recovery cluster. Apply the monitor package
configuration.
Editing the Continentalclusters Configuration File |  |
First, on one cluster, generate an ASCII configuration
template file using the cmqueryconcl command. The
recommended name and location for this file is /etc/cmcluster/cmconcl.config. (If preferred, choose a different name.) Example: # cd /etc/cmcluster # cmqueryconcl -C cmconcl.config This file has three editable sections: Customize each section according to your needs. The
following are some guidelines for editing each section. Editing Section 1—Cluster InformationEnter cluster-level information as follows in
this section of the file:
Enter a name for the continental cluster on the line
that contains the CONTINENTAL_CLUSTER_NAME keyword. Choose any name, but it cannot be easily changed after
the configuration is applied. To change the name, it is required to
first delete the existing configuration as described in “Renaming a Continental Cluster”. Continentalclusters
provides an optional maintenance feature for recovery groups. This
feature is enabled by configuring an absolute path to a file system
for the CONTINENTAL_CLUSTER_STATE_DIR parameter. If this feature is not required, this parameter can be
omitted. Enter the name of the first cluster after the first CLUSTER_NAME keyword
followed by the names of all the nodes within the first cluster. Use
a separate NODE_NAME keyword and
HP-UX host name for each node. Enter the domain name of the cluster’s nodes
following the DOMAIN_NAME keyword. Optionally, enter the name of the monitor package
on the first cluster after the MONITOR_PACKAGE_NAME keyword and the interval at which monitoring by this package will
take place (minutes and/or seconds) following the MONITOR_INTERVAL keyword. The monitor
interval defines how long it can take for Continentalclusters to detect
that a cluster is in a certain state. The default interval is 60 seconds,
but the optimal setting depends on your system’s performance.
Setting this interval too low can result in the monitor’s falsely
reporting an Unreachable or Error state. If this is observed during
testing, use a larger value. It is suggested to use the name “ccmonpkg” for all Continentalclusters monitors. Create this package
on each cluster containing a recovery package. If it is not desired
to monitor a cluster, which does not containing a recovery package,
it is required to delete or comment out the MONITOR_PACKAGE_NAME line and the MONITOR_INTERVAL line.
For mutual recovery, create the monitor package on both the first
and second clusters.  |  |  |  |  | NOTE: Monitoring of a cluster not containing recovery packages is
optional. For example, set up monitoring of such a cluster to be able
to check the status of the data replication technology being used. |  |  |  |  |
Repeat steps 2 through 4 for the other participating
cluster or clusters.
 |  |  |  |  | NOTE: The monitor package is sensitive to system time
and date. If you change the system time or date either backwards or
forwards on the node where the monitor is running, notifications of
alerts and alarms may be sent at incorrect times. |  |  |  |  |
A printout of Section 1 of the Continentalclusters
ASCII configuration file follows.
 |
################################################################
#### #### CONTINENTAL CLUSTER CONFIGURATION FILE #### ####
#### #### This file contains Continentalclusters #### ####
#### #### configuration data. #### ####
#### #### The file is divided into three sections, #### ####
#### #### as follows: #### ####
#### #### 1. Cluster Information #### ####
#### #### 2. Recovery Groups #### ####
#### #### 3. Events, Alerts, Alarms, and #### ####
#### #### Notifications #### ####
#### #### #### ####
#### #### For complete details about how to set the #### ####
#### #### parameters in this file, consult the #### ####
#### #### cmqueryconcl(1m) manpage or your manual. #### ####
################################################################
#### #### Section 1. Cluster Information #### ####
#### #### This section contains the name of the #### ####
#### #### continental cluster,name of the state #### ####
#### #### directory, followed by the names of member #### ####
#### #### clusters and all their nodes.The #### ####
#### #### continental cluster name can be any string #### ####
#### #### you choose, up to 40 characters in length. #### ####
#### #### The continentalclusters state directory #### ####
#### #### must be string containing the directory #### ####
#### #### location. The state directory must be #### ####
#### #### always an absolute path. The state #### ####
#### #### directory should be created on a shared #### ####
#### #### disk in the recovery cluster. This #### ####
#### #### parameter is optional, if maintenance mode #### ####
#### #### feature recovery groups is not required. #### ####
#### #### This parameter is mandatory, if maintenance #### ####
#### #### mode feature for recovery groups is #### ####
#### #### required. #### ####
#### #### Each member cluster name must be the same #### ####
#### #### as it appears in the MC/ServiceGuard cluster ########
#### #### configuration ASCII file for that cluster. #### ####
#### #### In addition to the cluster name, include a #### ####
#### #### domain name for the nodes in the cluster. #### ####
#### #### Node Names must be the same as those that #### ####
#### #### appear in the cluster configuration ASCII #### ####
#### #### file. A minimum of two member cluster needs #### ####
#### #### to be specified. You may configure one #### ####
#### #### cluster to serve as recovery cluster for #### ####
#### #### one or more other clusters. #### ####
#### #### #### ####
#### #### In the space below, enter the continental #### ####
#### #### cluster name, then enter a cluster name for #### ####
#### #### each member cluster, followed by the names #### ####
#### #### of all the nodes in that cluster.Following #### ####
#### #### the node names, enter the name of a monitor #### ####
#### #### package that will run the continental #### ####
#### #### cluster monitoring software on that cluster.#### ####
#### #### It is strongly recommended that you use the #### ####
#### #### same name for the monitoring package on all #### ####
#### #### clusters; "ccmonpkg" is suggested. #### ####
#### #### Monitoring of the recovery cluster by the #### ####
#### #### primary cluster is optional. If you do not #### ####
#### #### wish to monitor the recovery cluster, you #### ####
#### #### must delete or comment out the #### ####
#### #### MONITOR_PACKAGE_NAME and MONITOR_INTERVAL #### ####
#### #### lines that follow the name of the primary #### ####
#### #### cluster. #### ####
#### #### After the monitor package name, enter a #### ####
#### #### monitor interval,specifying a number of #### ####
#### #### minutes and/or seconds. The default is 60 #### ####
#### #### seconds, the minimum is 30 seconds, and the #### ####
#### #### maximum is 5 minutes. #### ####
#### #### #### ####
#### #### CLUSTER_NAME westcoast #### ####
#### #### CLUSTER_DOMAIN westnet.myco.com #### ####
#### #### NODE_NAME system1 #### ####
#### #### NODE_NAME system2 #### ####
#### #### MONITOR_PACKAGE_NAME ccmonpkg #### ####
#### #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS#### ####
#### #### #### ####
#### #### #### ####
#### #### CLUSTER_NAME eastcoast #### ####
#### #### CLUSTER_DOMAIN eastnet.myco.com #### ####
#### #### NODE_NAME system3 #### ####
#### #### NODE_NAME system4 #### ####
#### #### MONITOR_PACKAGE_NAME ccmonpkg #### ####
#### #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS #### ####
#### #### #### ####
#### #### CONTINENTAL_CLUSTER_NAME ccluster1 #### ####
#### #### CONTINENTAL_CLUSTER_STATE_DIR #### ####
#### #### CLUSTER_NAME #### ####
#### #### CLUSTER_DOMAIN #### ####
#### #### NODE_NAME #### ####
#### #### NODE_NAME #### ####
#### #### MONITOR_PACKAGE_NAME ccmonpkg #### ####
#### #### MONITOR_INTERVAL 60 SECONDS #### ####
#### #### CLUSTER_NAME #### ####
#### #### CLUSTER_DOMAIN #### ####
#### #### NODE_NAME #### ####
#### #### NODE_NAME #### ####
#### #### MONITOR_PACKAGE_NAME ccmonpkg #### ####
#### #### MONITOR_INTERVAL 60 SECONDS #### ####
|
 |
Editing Section 2 – Recovery GroupsIn this section of the file, define recovery groups,
which are sets of Serviceguard packages that are ready to recover
applications in case of cluster failure. Create a separate recovery
group for each package that will be started on a cluster when the cmrecovercl(1m) command is issued on that cluster. Examples of recovery groups are shown graphically
in Figure 2-7 and Figure 2-8. Enter data in Section 2 as follows: Enter a name for the recovery
group following the RECOVERY_GROUP_NAME keyword. This can be any name you choose. After the PRIMARY_PACKAGE keyword, enter a primary package
definition consisting of the cluster name followed by a slash (/)
followed by the package name. Example: PRIMARY_PACKAGE LAcluster/custpkg Optionally, enter a data
sender package definition consisting of the cluster name, a slash
(/), and the data sender package name after the DATA_SENDER_PACKAGE keyword. This is only necessary if you are using a logical data
replication method that requires a data sender package. After the RECOVERY_PACKAGE keyword, enter a recovery package
definition consisting of the cluster name followed by a slash (/)
followed by the package name. Example: RECOVERY_PACKAGE NYcluster/custpkg_bak Optionally, enter a data
receiver package definition consisting of the cluster name, a slash
(/), and the data receiver package name after the DATA_RECEIVER_PACKAGE keyword. This is only necessary
if using a logical data replication method that requires a data receiver
package. Optionally, enter a rehearsal
package definition consisting of the cluster name, a slash (/), and
the rehearsal package name after the REHEARSAL_PACKAGE keyword. This is only required for performing a rehearsal operation
at the recovery cluster. Repeat these steps for
each package that will be recovered. Each package must be configured
in a separate recovery group.
A printout of Section 2 of the Continentalclusters
ASCII configuration file follows.
 |
###############################################################
#### #### Section 2. Recovery Groups #### ####
#### #### This section defines recovery groups--sets #### ####
#### #### of ServiceGuard packages that are ready to #### ####
#### #### recover applications in case of cluster #### ####
#### #### failure. Recovery groups allow one cluster #### ####
#### #### in the continental cluster configuration to #### ####
#### #### back up another member cluster's packages. #### ####
#### #### You create a separate recovery group for #### ####
#### #### each ServiceGuard package that will be #### ####
#### #### started on the recovery cluster when the #### ####
#### #### cmrecovercl(1m) command is issued. #### ####
#### #### #### ####
#### #### A recovery group consists of a primary #### ####
#### #### package running on one cluster, a recovery #### ####
#### #### package that is ready to run on a different #### ####
#### #### cluster. In some cases, a data receiver #### ####
#### #### package runs on the same cluster as the #### ####
#### #### recovery package, and in some cases, a data #### ####
#### #### sender package runs on the same cluster #### ####
#### #### as the primary package.For rehearsal #### ####
#### #### operations a rehearsal package forms a part #### ####
#### #### of the recovery group. The rehearsal package #### ####
#### #### is configured always in the recovery cluster.#### ####
#### #### During normal operation, the primary package #### ####
#### #### is running an application program on the #### ####
#### #### primary cluster, and the recovery package, #### ####
#### #### which is configured to run the same #### ####
#### #### application, is idle on the recovery cluster.#### ####
#### #### If the primary package performs disk I/O, #### ####
#### #### the data that is written to disk is #### ####
#### #### replicated and made available for possible #### ####
#### #### use on the recovery cluster. #### ####
#### #### For some data replication techniques, this #### ####
#### #### involves the use of a data receiver package #### ####
#### #### running on the recovery cluster. #### ####
#### #### In the event of a major failure on the #### ####
#### #### primary cluster, the user issues the #### ####
#### #### cmrecovercl(1m) command to halt any data #### ####
#### #### receiver packages and start up all the #### ####
#### #### recovery packages that exist on the #### ####
#### #### recovery cluster. #### ####
#### #### During rehearsal operation, before starting #### ####
#### #### the rehearsal packages,care should be taken #### ####
#### #### that the replication between the primary and #### ####
#### #### the recovery sites is suspended. For some #### ####
#### #### data replication techniques which involve #### ####
#### #### the use of a data receiver package, #### ####
#### #### rehearsal operations must be commenced only #### ####
#### #### after shutting down the data receiver #### ####
#### #### package at the recovery cluster. Rehearsal #### ####
#### #### packages are started using the #### ####
#### #### cmrecovercl -r command. #### ####
#### #### Enter the name of each package recovery #### ####
#### #### group together with the fully qualified #### ####
#### #### names of the primary and recovery packages. #### ####
#### #### If appropriate, enter the fully qualified #### ####
#### #### name of a data receiver package. Note that #### ####
#### #### the data receiver package must be on the #### ####
#### #### same cluster as the recovery package. #### ####
#### #### The primary package name includes the #### ####
#### #### primary cluster name followed by a slash #### ####
#### #### ("/") followed by the package name on the #### ####
#### #### primary cluster. The recovery package name #### ####
#### #### includes the recovery cluster name, followed #### ####
#### #### by a slash ("/")followed by the package name #### ####
#### #### on the recovery cluster. #### ####
#### #### #### ####
#### #### The data receiver package name includes the #### ####
#### #### recovery cluster name, followed by a slash #### ####
#### #### ("/") followed by the name of the data #### ####
#### #### receiver package on the recovery cluster. #### ####
#### #### The rehearsal package name includes the #### ####
#### #### recovery cluster name, followed by a slash #### ####
#### #### ("/"). #### ####
#### #### Up to 29 recovery groups can be entered. #### ####
#### #### #### ####
#### #### Example: #### ####
#### #### RECOVERY_GROUP_NAME nfsgroup #### ####
#### #### PRIMARY_PACKAGE westcoast/nfspkg #### ####
#### #### DATA_SENDER_PACKAGE westcoast/nfssenderpkg #### ####
#### #### RECOVERY_PACKAGE eastcoast/nfsbackuppkg #### ####
#### #### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg#### ####
#### #### REHEARSAL_PACKAGE eastcoast/nfsrehearsalpkg #### ####
#### #### #### ####
#### #### RECOVERY_GROUP_NAME hpgroup #### ####
#### #### PRIMARY_PACKAGE westcoast/hppkg #### ####
#### #### DATA_SENDER_PACKAGE westcoast/hpsenderpkg #### ####
#### #### RECOVERY_PACKAGE eastcoast/hpbackuppkg #### ####
#### #### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg#### ####
#### #### REHEARSAL_PACKAGE eastcoast/hprehearsalpkg #### ####
|
 |
Editing Section 3—Monitoring DefinitionsFinally, enter monitoring definitions that define
cluster events and set times at which alert and alarm notifications
are to be sent out. Define notifications for all cluster events—Unreachable,
Down, Up, and Error. Although it is impossible to make specific recommendations
for every Continentalclusters environment, here are a few general
guidelines about notifications. Specify the cluster event
by using the CLUSTER_EVENT keyword
followed by the name of the cluster, a slash (“/”) and
the name of the status—Unreachable, Down, Up, or Error. Example: CLUSTER_EVENT LAcluster/UNREACHABLE Define a CLUSTER_ALERT at appropriate times following the
appearance of the event. Specify the elapsed time and include a NOTIFICATION message that provides useful information
about the event. Create as many alerts as needed, and send as many
notifications as needed to different destinations (see the comments
in the file excerpt below for a list of destination types). Note that
the message text in the notification must be on a separate line in
the file. If the event is for a
cluster in an Unreachable condition, define a CLUSTER_ALARM at appropriate times. Specify the elapsed time since the appearance
of the event (greater than the time used for the last CLUSTER_ALERT), and include a NOTIFICATION message that indicates what action should be taken. Create as many
alarms as needed, and send as many notifications as needed to different
destinations (see the comments in the file excerpt below for a list
of destination types). If using a monitor on
a cluster containing no recovery packages, define alerts for the monitoring
of Up, Down, Unreachable, and Error states on the recovery cluster.
It is not necessary to define alarms.
A printout of Section 3 of the Continentalclusters
ASCII configuration file follows.
 |
################################################################
#### #### Section 3. Monitoring Definitions #### ####
#### #### This section of the file contains monitoring #### ####
#### #### definitions. Well planned monitoring #### ####
#### #### definitions will help in making the decision #### ####
#### #### whether or not to issue the cmrecovercl(1m) #### ####
#### #### command. Each monitoring definition specifies#### ####
#### #### a cluster event along with the messages #### ####
#### #### that should be sent to system administrators #### ####
#### #### or other IT staff. #### ####
#### #### All messages are appended to the default log #### ####
#### #### /var/opt/resmon/log/cc/eventlog as well as to#### ####
#### #### the destination you specify below. #### ####
#### #### A cluster event takes place when a monitor #### ####
#### #### that is located on one cluster detects a #### ####
#### #### significant change in the condition of #### ####
#### #### another cluster. The monitored cluster #### ####
#### #### conditions are: #### ####
#### #### UNREACHABLE - the cluster is unreachable. #### ####
#### #### This will occur when the communication link #### ####
#### #### to the cluster has gone down, as in a WAN #### ####
#### #### failure, or when the all nodes in the #### ####
#### #### cluster have failed. #### ####
#### #### DOWN - the cluster is down but nodes are #### ####
#### #### responding. This will occur when the cluster #### ####
#### #### is halted, but some or all of the member #### ####
#### #### nodes are booted and communicating with the #### ####
#### #### monitoring cluster. #### ####
#### #### UP - the cluster is up. #### ####
#### #### ERROR - there is a mismatch of cluster #### ####
#### #### versions or a security error. #### ####
#### #### A change from one of these conditions to #### ####
#### #### another one is a cluster event. You can #### ####
#### #### define alert or alarm states based on the #### ####
#### #### length of time since the cluster event was #### ####
#### #### observed. Some events are noteworthy at the #### ####
#### #### time they occur, and some are noteworthy #### ####
#### #### when they persist over time. Setting the #### ####
#### #### elapsed time to zero results in a message #### ####
#### #### being sent as soon as the event takes place. #### ####
#### #### Setting the elaspsed time to 5 minutes results#### ####
#### #### in a message being sent when the condition #### ####
#### #### has persisted for 5 minutes. #### ####
#### #### An alert is intended as informational only. #### ####
#### #### Alerts may be sent for any type of cluster #### ####
#### #### condition. For an alert, a notification is #### ####
#### #### sent to a system administrator or other #### ####
#### #### destination. Alerts are not intended to #### ####
#### #### indicate the need for recovery. The #### ####
#### #### cmrecovercl(1m) command is disabled. #### ####
#### #### #### ####
#### #### An alarm is an indication that a condition ####
#### #### exists that may require recovery. For an ####
#### #### alarm, a notification is sent, and in ####
#### #### addition, the cmrecovercl(1m) command is ####
#### #### enabled for immediate execution, allowing ####
#### #### the administrator to carry out cluster ####
#### #### recovery. An alarm can only be defined for ####
#### #### an UNREACHABLE or DOWN condition in the ####
#### #### monitored cluster. ####
#### #### A notification defines a message that is ####
#### #### appended to the log file ####
#### #### /var/opt/resmon/log/cc/eventlog and sent ####
#### #### to other specified destinations, including ####
#### #### email addresses, SNMP traps, the system ####
#### #### console, or the syslog file. The message ####
#### #### string in a notification can be no more than ####
#### #### 170 characters. Enter notifications in one of ####
#### #### the following forms: ####
#### #### NOTIFICATION CONSOLE ####
#### #### <message> ####
#### #### Message written to the console. ####
#### #### ####
#### #### NOTIFICATION EMAIL <address> ####
#### #### <message> ####
#### #### Message emailed to a fully qualified email ####
#### #### address. ####
#### #### #####
#### #### NOTIFICATION OPC <level> ####
#### #### <message> ####
#### #### The <message> is sent to OpenView IT/Operations)####
#### #### The value of <level> may be 8 (normal), ####
#### #### 16 (warning), 64 (minor), 128 (major),32 ####
#### #### (critical). ####
#### #### NOTIFICATION SNMP <level> ####
#### #### <message> ####
#### #### The <message> is sent as an SNMP trap. ####
#### #### The value of <level> may be 1 (normal), ####
#### #### 2 (warning), 3 (minor), 4 (major),5 (critical). ####
#### #### NOTIFICATION SYSLOG ####
#### #### <message> ####
#### #### A notice of the event is appended to the syslog ####
#### #### file. ####
#### #### ####
#### #### NOTIFICATION TCP <nodename>:<portnumber> #####
#### #### <message> ####
#### #### Message is sent to a TCP port on the specified ####
#### #### node. ####
#### #### ####
#### #### NOTIFICATION TEXTLOG <pathname> ####
#### #### <message> ####
#### #### A notice of the event is written to a user- ####
#### #### specified log file.<pathname> must be a full ####
#### #### path for the user-specified file. The user ####
#### #### specified file must be under /var/opt/resmon/log ####
#### #### directory. ####
#### #### NOTIFICATION UDP <nodename>:<portnumber> ####
#### #### <message> ####
#### #### Message is sent to a UDP port on the specified ####
#### #### node. ####
#### #### For the cluster event, enter a cluster name ####
#### #### followed by a slash ("/") and a cluster condition ####
#### #### (UP, DOWN, UNREACHABLE,ERROR) that may be detected ####
#### #### by a monitor program. ####
#### #### #####
#### #### Each cluster event must be paired with a ####
#### #### monitoring cluster. Include the name of the ####
#### #### cluster on which the monitoring will take place. ####
#### #### Events can be monitored from either the primary #####
#### #### cluster or the recovery cluster. ####
#### #### ####
#### #### Alerts, alarms, and notifications have the ####
#### #### following syntax. ####
#### #### ####
#### #### CLUSTER_ALERT <min> MINUTES <sec> SECONDS ####
#### #### Delay before the software issues an alert ####
#### #### notification about the cluster event. ####
#### #### ####
#### #### CLUSTER_ALARM <min> MINUTES <sec> SECONDS ####
#### #### Delay before the software issues an alarm ####
#### #### notification about the cluster event and ####
#### #### enables the cmrecovercl(1m) command for ####
#### #### immediate execution. ####
#### #### NOTIFICATION <type> ####
#### #### <message> ####
#### #### A string value which is sent from the monitoring ####
#### #### cluster for a given event to a specified ####
#### #### destination. The <message>, which can be no more ####
#### #### than 170 characters, is also appended to the ####
#### #### /var/opt/resmon/log/cc/eventlog file on the ####
#### #### monitoring node in the cluster where the event ####
#### #### was detected. ####
#### #### ####
#### #### ####
#### #### Example: ####
#### #### ####
#### #### CLUSTER_EVENT westcoast/UNREACHABLE ####
#### #### MONITORING_CLUSTER eastcoast ####
#### #### CLUSTER_ALERT 5 MINUTES ####
#### #### NOTIFICATION EMAIL admin@primary.site ####
#### #### "westcoast status unknown for 5 min. Call ####
#### #### secondary site." ####
#### #### NOTIFICATION EMAIL admin@secondary.site ####
#### #### "Call primary admin. (555) 555-6666." ####
#### #### ####
#### #### CLUSTER_ALERT 10 MINUTES ####
#### #### NOTIFICATION EMAIL admin@primary.site ####
#### #### "westcoast status unknown for 10 min. Call ####
#### #### secondary site." ####
#### #### NOTIFICATION EMAIL admin@secondary.site ####
#### #### "Call primary admin. (555) 555-6666." ####
#### #### NOTIFICATION CONSOLE ####
#### #### "Cluster ALERT: westcoast not responding." ####
#### #### ####
#### #### CLUSTER_ALARM 15 MINUTES ####
#### #### NOTIFICATION EMAIL admin@primary.site ####
#### #### "westcoast status unknown for 15 min. Takeover ####
#### #### advised." ####
#### #### NOTIFICATION EMAIL admin@secondary.site ####
#### #### "westcoast still not responding. Use ####
#### #### cmrecovercl command." ####
#### #### NOTIFICATION CONSOLE ####
#### #### "Cluster ALARM: Issue cmrecovercl command to take ####
#### #### over "westcoast." ####
#### #### ####
#### #### CLUSTER_EVENT westcoast/UP ####
#### #### MONITORING_CLUSTER eastcoast ####
#### #### CLUSTER_ALERT 0 MINUTES ####
#### #### NOTIFICATION EMAIL admin@secondary.site ####
#### #### "Cluster westcoast is up." ####
#### #### ####
#### #### CLUSTER_EVENT westcoast/DOWN ####
#### #### MONITORING_CLUSTER eastcoast ####
#### #### CLUSTER_ALERT 0 MINUTES ####
#### #### NOTIFICATION EMAIL admin@secondary.site ####
#### #### "Cluster westcoast is down." ####
#### #### ####
#### #### CLUSTER_EVENT westcoast/ERROR ####
#### #### MONITORING_CLUSTER eastcoast ####
#### #### CLUSTER_ALERT 0 MINUTES ####
#### #### NOTIFICATION EMAIL admin@secondary.site ####
#### #### "Error in monitoring cluster westcoast." ####
#### #### ####
#### #### CLUSTER_EVENT <cluster_name>/UNREACHABLE ####
#### #### MONITORING_CLUSTER CLUSTER_ALERT #### |
 |
The TEXTLOG notification
file should be placed under the /var/opt/resmon/log directory. If any other directory is specified, an error
is reported by the cmapplyconcl and cmcheckconcl commands. If you specify any other location for logging,
the following error message appears:
The target after textlog “ ” is not valid. |
Please specify a file under /var/opt/resmon/log directory |
If you upgraded Continentalclusters but are still
using the old configuration file, the textlog location is still specified
as /var/adm/cmconcl. As a result, the following
error message appears:
The file path “s” specified for textlog is invalid. |
The destination file must be under /var/opt/resmon/log directory. Please change the path and restart the ccmon package.  |  |  |  |  | IMPORTANT: For TEXTLOG notification, the destination log
file must be in the /var/opt/resmon/log directory. If the destination file is not available in
this directory, Continentalclusters will not work properly. |  |  |  |  |
Checking and Applying the Continentalclusters Configuration |  |
After editing the configuration file on any of
the participating clusters in the Continentalcluster, halt any monitor
packages that are running, then use the following steps to apply the
configuration to all nodes in the continental cluster. Verify the content of
the file. # cmcheckconcl -v -C
cmconcl.config This command will verify that all parameters are
within range, all fields are filled out, and the entries (such as NODE_NAME) are valid. Distribute the Continentalclusters
configuration information to all nodes in the continental cluster. # cmapplyconcl -v -C cmconcl.config Configuration data is copied to all nodes and
in all the participating clusters. This data includes a set of managed
object files that are copied to the /ec/cmconcl/instances directory on every node in all clusters. Be sure to make a backup
copy of the configuration ascii file and save
it on the other cluster after it is applied.
 |  |  |  |  | NOTE: If any problems occur during the execution of cmapplyconcl, repeat the command as often as necessary.
Issuing the command will delete the existing Continentalclusters configuration
and apply the new one. |  |  |  |  |
When configuration is finished, your systems should
have sets of files similar to those shown in Figure 2-9. Starting the Continentalclusters Monitor Package |  |
Starting the monitoring package enables all Continentalclusters
monitoring functionality. Before doing this, ensure that the primary
packages selected to be protected are running normally and that data
sender and receiver packages, if they are being used for logical data
replication, are working properly. If using physical data replication, make sure
that it is operational. On each monitoring cluster start the monitor package. # cmmodpkg -e ccmonpkg After the monitor package is started, a log file /var/adm/cmconcl/sentryd.log will
be created on the node where the package is running to record the
Continentalclusters monitoring activities. It is recommended that
this log file be archived or cleaned up periodically. Validating the Configuration |  |
The following table shows the status of Continentalclusters
packages in a recovery pair when each cluster is running normally
and no recovery has taken place. Table 2-6 Status of Continentalclusters Packages Before Recovery | Primary Cluster | Recovery Cluster |
|---|
Data Replication Method | Primary Package | Data Sender Package | Optional Monitor
Package | Recovery Package | Data Receiver Package | Required Monitor Package |
|---|
Physical— Symmetrix | Running | Not
used | Running (optional) | Halted | Not used | Running (required) | Physical— XP Series | Running | Not
used | Running (optional) | Halted | Not used | Running (required) | Physical—EVA Series | Running | Not used | Running (optional) | Halted | Not used | Running (required) | Logical— Oracle Standby
Database | Running | Not used | Running (optional) | Halted | Running | Running (required) |
Use the following steps to ensure the components
are functioning correctly: Make sure all daemons
are running. # ps -ef | grep
cmcl Two important Continentalclusters daemons are cmclsentryd and cmclrmond. Check the cluster configuration
on each cluster using the cmviewcl -v command. Ensure that each primary package is running correctly. Ensure that the data sender packages (if any are used
for logical data replication) are running correctly. Ensure that the data receiver packages (if any are
used for logical data replication) are running correctly. Ensure that the continental cluster monitor package
is running correctly on each monitoring cluster.
On all nodes, use the tail -f /adm/syslog/syslog.log command to check the end
of the SYSLOG file for errors. On nodes where packages
are running, check all package log files for errors, including application
packages and the monitor package. Use the following command
to verify the correct operation of the Continentalclusters daemon: # /opt/cmom/tools/bin/cmreadlog -f
\/var/adm/cmconcl/sentryd.log Make sure the Continentalclusters
monitor packages (default name ccmonpkg) on each cluster
fails over properly if a node fails. Change each cluster’s
state to test that the monitor running on the monitoring cluster will
detect the change in status and send notification. View the status of the
Continentalclusters primary and recovery clusters, including configured
event data. # cmviewconcl -v
 |  |  |  |  | CAUTION: Never issue the cmrunpkg command for a recovery package when Continentalclusters
is enabled, because there is no guaranteed way of preventing a package
that is running on one cluster from running on the other cluster if
the package is started using this command. The potential for data
corruption is great. |  |  |  |  |
Chapters 3, 4 and 5 contain additional suggestions
on testing the data replication and package configuration. Documenting the Recovery Procedure |  |
Once everything is configured and the Continentalclusters
monitor is running, it is necessary to define your recovery procedure
and train the administrators and operators at both sites. The checklist
in Figure 2-10 is an example of
to document the recovery procedure. Reviewing the Recovery Procedure |  |
Using the checklist described in the previous
section, step through the recovery procedure to make sure that all
necessary steps are included. If possible, create simulated failures
to test the alert and alarm scenarios coded in the Continentalclusters
configuration file.
|