 |
» |
|
|
 |
If necessary, use the swinstall command to install the ContinentalClusters product on all nodes
in both clusters. Then create the Continentalclusters configuration
using the following steps: Prepare the security files. Create the monitor package on each cluster containing
a recovery package. Clusters not containing a recovery package may
also monitor the other cluster in the recovery pair by creating
a monitor package on that cluster. Edit the ContinentalClusters configuration file on a node
of your choice in any cluster. Check and apply the Continentalclusters
configuration. Start each ContinentalClusters monitor package on it’s
cluster. Validate the configuration. Document the recovery procedure and distribute the
documentation to both sites. Make sure all personnel are familiar
with these procedures. Test recovery procedures.
Preparing
Security Files |  |
Running a Continentalclusters command requires root access
to cluster information on all the nodes of the participating Serviceguard
clusters in the configuration. Before doing the Continentalclusters
configuration, edit the /etc/cmcluster/cmclnodelist file on each node of all the participating clusters
to include entries that will allow access by all nodes in the Continentalclusters.
Here is a sample entry in the /etc/cmcluster/cmclnodelist file for a continental cluster configured with two,
two-node Serviceguard clusters: Also, be sure to create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running
monitor packages and Continentalclusters commands to obtain information
from other nodes about the health of each cluster. The file must
contain entries that allow access to all nodes in the continental
cluster by the nodes where monitors and Continentalclusters commands
are running. Define the order of security checking by creating entries
of the following types: - order deny,allow
If deny is first, the deny list is checked first
to see if the node is there, then the allow
list is checked. - deny from
lists all the nodes that are denied access. Permissible entries
are: - all
All hosts are denied access. - domain
Hosts whose names match, or end in, this string
are denied access, for example, hp.com. - hostname
The named host (for example, kitcat.myco.com)
is denied access. - IP address
Either a full IP address, or a partial IP address
of 1 to 3 bytes for subnet restriction is denied. - network/netmask
This pair of addresses allows more precise restriction
of hosts, (for example, 10.163.121.23/225.225.0.0). - network/nnnCIDR
This specification is like the network/netmask specification, except
the netmask consists of nnn high-order 1 bits. “CIDR” stands
for Classless Interdomain Routing, a type of routing supported by
the Border Gateway Protocol (BGP).
- allow from
This lists all the nodes that are allowed access. Permissible
entries are: - all
All hosts are allowed access. - domain
Hosts whose names match, or end in, this string
are allowed access, for example, hp.com. - hostname
The named host (for example, kitcat.myco.com)
is allowed access. - IP address
Either a full IP address, or a partial IP address
of 1 to 3 bytes for subnet inclusion is allowed. - network/netmask
This pair of addresses allows more precise inclusion
of hosts, (for example, 10.163.121.23/225.225.0.0). - network/nnnCIDR
This specification is like the network/netmask specification, except
the netmask consists of nnn high-order 1 bits. “CIDR” stands
for Classless Interdomain Routing, a type of routing supported by
the Border Gateway Protocol (BGP).
The most typical entry is hostname. The following entries are from a typical /etc/opt/cmom/cmomhosts file: order allow,deny allow from lanode1.myco.com allow from lanode2.myco.com allow from nynode1.myco.com allow from nynode2.myco.com allow from 10.177.242.12 |
If the file is installed on all nodes in the continental cluster,
these entries will allow Continentalclusters commands and monitors
running on lanode1, lanode2, nynode1, nynode2 to obtain information about the clusters in the configuration. Creating
the Monitor Package |  |
The ContinentalClusters monitoring software is configured as a Serviceguard package
so that it remains highly available. If more than one primary cluster
is configured to share the same common recovery cluster, such as
a multiple recovery pair scenario, the monitor package running on the common recovery cluster performs the
following: monitors all
of the primary clusters sends notifications for all
of the monitored clusters events
The following steps should be carried out on the recovery
cluster and can be repeated on the primary cluster if you want the
primary cluster to monitor the recovery cluster: On the node where the configuration
is located, create a directory for the monitor package. # mkdir /etc/cmcluster/ccmonpkg Copy the template files from the /opt/cmconcl/scripts directory to the /etc/cmcluster/ccmonpkg directory. # cp /opt/cmconcl/scripts/ccmonpkg.* \ /etc/cmcluster/ccmonpkg ccmonpkg.config is the ASCII package configuration
file template for the ContinentalClusters monitoring application.
Edit the package configuration file (suggested name
of /etc/cmcluster/ccmonpkg/ccmonpkg.config) to match the cluster configuration: Add the names of all nodes in the cluster
on which the monitor may run. AUTO_RUN(PKG_SWITCHING_ENABLED used
prior to Serviceguard A.11.12) should be set to YES so
that the monitor package will fail over between local nodes. (Note,
for all primary and recovery packages, AUTO_RUN is always set to NO.)
Use the cmcheckconf command to validate the package. # cmcheckconf -P ccmonpkg.config Copy the package configuration file ccmonpkg.config and control script ccmonpkg.cntl to the monitor package directory (default name /etc/cmcluster/ccmonpkg) on all the other nodes in the cluster. Make sure this
file is executable. Use the cmapplyconf command to add the package to the Serviceguard configuration. # cmapplyconf -P ccmonpkg.config
The following sample package configuration file (comments
have been left out) shows a typical package configuration for a ContinentalClusters monitor
package:
PACKAGE_NAME ccmonpkg FAILOVER_POLICY CONFIGURED_NODE FAILBACK_POLICY MANUAL NODE_NAME LAnode1 NODE_NAME LAnode2 RUN_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntl RUN_SCRIPT_TIMEOUT NO_TIMEOUT HALT_SCRIPT /etc/cmcluster/ccmonpkg/ccmonpkg.cntl HALT_SCRIPT_TIMEOUT NO_TIMEOUT SERVICE_NAME ccmonpkg.srv SERVICE_FAIL_FAST_ENABLED NO SERVICE_HALT_TIMEOUT 300 PKG_SWITCHING_ENABLED YES NET_SWITCHING_ENABLED YES NODE_FAIL_FAST_ENABLED NO |
Editing
the ContinentalClusters Configuration File |  |
First, on one cluster, generate an ASCII configuration template
file using the cmqueryconcl command. The recommended name and location for this file
is /etc/cmcluster/cmconcl.config. (If preferred, choose a different name.) Example: # cd /etc/cmcluster # cmqueryconcl -C cmconcl.config This file has three editable sections: Customize
each section according to your needs. The following are some guidelines
for editing each section. Editing
Section 1—Cluster InformationEnter cluster-level information as follows in this section
of the file: Enter
a name for the continental cluster on the line that contains the
CONTINENTAL_CLUSTER_NAME keyword. Choose any name, but it cannot be easily
changed after the configuration is applied. To change the name,
it is required to first delete the existing configuration as described
in “Renaming
a Continental Cluster”. Enter the name of the first
cluster after the first CLUSTER_NAME keyword followed by the names of all the nodes
within the first cluster. Use a separate NODE_NAME keyword and HP-UX host name for each node. Enter the domain name of
the cluster’s nodes following the DOMAIN_NAME keyword. Optionally, enter the name of the monitor package
on the first cluster after the MONITOR_PACKAGE_NAME keyword and the interval at which monitoring by
this package will take place (minutes and/or seconds) following
the MONITOR_INTERVAL keyword. The monitor interval defines how long it can take for Continentalclusters
to detect that a cluster is in a certain state. The default interval
is 60 seconds, but the optimal setting depends on your system’s
performance. Setting this interval too low can result in the monitor’s
falsely reporting an Unreachable or Error state. If this is observed
during testing, use a larger value. It is suggested to use the name “ccmonpkg” for all Continentalclusters monitors. Create
this package on each cluster containing a recovery package. If it
is not desired to monitor a cluster, which does not containing a
recovery package, it is required to delete or comment out the MONITOR_PACKAGE_NAME line and the MONITOR_INTERVAL line. For mutual recovery, create the monitor package
on both the first and second clusters.  |  |  |  |  | NOTE: Monitoring of a cluster not containing recovery packages
is optional. For example, set up monitoring of such a cluster to
be able to check the status of the data replication technology being
used. |  |  |  |  |
Repeat steps 2 through 4 for the other participating
cluster or clusters.
 |  |  |  |  | NOTE: The monitor package is sensitive to system time and
date. If you change the system time or date either backwards or
forwards on the node where the monitor is running, notifications
of alerts and alarms may be sent at incorrect times. |  |  |  |  |
A printout of Section 1 of the Continentalclusters ASCII configuration file
follows. ############################################################################### #### #### #### CONTINENTAL CLUSTER CONFIGURATION FILE #### #### #### #### #### #### This file contains ContinentalClusters configuration data. #### #### The file is divided into three sections, as follows: #### #### #### #### 1. Cluster Information #### #### 2. Recovery Groups #### #### 3. Events, Alerts, Alarms, and Notifications #### #### #### #### For complete details about how to set the parameters in #### #### this file, consult the cmqueryconcl(1m) manpage or your manual. #### #### #### ########################################################################### |
#### #### Section 1. Cluster Information #### # |
#### #### This section contains the name of the continental cluster #### #### followed by the names of member clusters and all their nodes. #### #### The continental cluster name can be any string you choose, up #### #### to 40 characters in length. Each member cluster name must be #### #### the same as it appears in the ServiceGuard cluster configuration #### #### ASCII file for that cluster. In addition to the cluster #### #### name, include a domain name for the nodes in the cluster. #### #### Node names must be the same as those that appear in the cluster #### #### configuration ASCII file. A minimum of two member cluster #### #### needs to be specified. You may configure one cluster to serve #### #### as recovery cluster for one or more other clusters. |
#### #### In the space below, enter the continental cluster name, |
 |
 |
#### #### then enter a cluster name for each member cluster, followed #### #### #### by the names of all the nodes in that cluster. Following #### #### #### the node names, enter the name of a monitor package #### #### #### that will run the continental cluster monitoring software #### #### #### on that cluster. It is strongly recommended that you use the #### #### #### same name for the monitoring package on all clusters; #### #### #### "ccmonpkg" is suggested. Monitoring of the recovery cluster #### #### #### by the primary cluster is optional. If you do not wish to #### #### #### monitor the recovery cluster, you must delete or comment out the #### #### #### MONITOR_PACKAGE_NAME and MONITOR_INTERVAL lines that follow the #### #### #### name of the primary cluster. #### #### #### #### #### After the monitor package name, enter a monitor interval, #### #### #### specifying a number of minutes and/or seconds. The default is 60 #### #### #### seconds, the minimum is 30 seconds, and the maximum is 5 minutes. #### #### #### #### #### Example: #### #### #### #### CONTINENTAL_CLUSTER_NAME ccluster1 #### #### #### #### CLUSTER_NAME westcoast #### #### CLUSTER_DOMAIN westnet.myco.com #### #### NODE_NAME system1 #### #### NODE_NAME system2 #### #### MONITOR_PACKAGE_NAME ccmonpkg #### #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS #### #### #### #### CLUSTER_NAME eastcoast #### #### CLUSTER_DOMAIN eastnet.myco.com #### #### NODE_NAME system3 #### #### NODE_NAME system4 #### #### MONITOR_PACKAGE_NAME ccmonpkg #### #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS #### #### ####CONTINENTAL_CLUSTER_NAME ccluster1CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDSCLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS |
 |
Editing
Section 2—Recovery GroupsIn this section of the file, define recovery groups, which
are sets of Serviceguard packages that are ready to recover applications
in case of cluster failure. Create a separate recovery group for
each package that will be started on a cluster when the cmrecovercl(1m) command is issued on that cluster. Examples of recovery groups are shown graphically in Figure 2-7 “Sample ContinentalClusters Recovery
Groups” and Figure 2-8 “Sample
Bi-directional Recovery Groups”. Enter data in Section 2 as follows: Enter a name
for the recovery group following the RECOVERY_GROUP_NAME keyword. This can be any name you choose. After the PRIMARY_PACKAGE keyword, enter a primary package definition consisting
of the cluster name followed by a slash (/) followed by the package
name. Example: PRIMARY_PACKAGE LAcluster/custpkg Optionally, enter a data
sender package definition consisting of the cluster name, a slash
(/), and the data sender package name after the DATA_SENDER_PACKAGE keyword. This is only necessary if you are using
a logical data replication method that requires a data sender package. After the RECOVERY_PACKAGE keyword, enter a recovery package definition consisting
of the cluster name followed by a slash (/) followed by the package
name. Example: RECOVERY_PACKAGE NYcluster/custpkg_bak Optionally, enter a data
receiver package definition consisting of the cluster name, a slash
(/), and the data receiver package name after the DATA_RECEIVER_PACKAGE keyword. This is only necessary if using a logical
data replication method that requires a data receiver package. Repeat these steps for each
package that will be recovered. Each package must be configured
in a separate recovery group.
A printout of Section 2 of the Continentalclusters ASCII configuration file
follows.  |
############################################################################### #### #### #### Section 2. Recovery Groups #### #### #### #### This section defines recovery groups--sets of Serviceguard #### #### packages that are ready to recover applications in case of #### #### cluster failure. Recovery groups allow one cluster in the #### #### continental cluster configuration to back up another member #### #### cluster's packages. You create a separate recovery group #### #### for each Serviceguard package that will be started on the #### #### recovery cluster when the cmrecovercl(1m) command is issued. #### #### #### #### A recovery group consists of a primary package running on #### #### one cluster, a recovery package that is ready to run on a #### #### different cluster. In some cases, a data receiver package runs #### #### on the same cluster as the recovery package, and in some cases, #### #### a data sender package runs on the same cluster as the primary #### #### package. #### #### #### #### During normal operation, the primary package is running an #### #### application program on the primary cluster, and the recovery #### #### package, which is configured to run the same application, is #### #### idle on the recovery cluster. If the primary package performs #### #### disk I/O, the data that is written to disk is replicated #### #### and made available for possible use on the recovery cluster. #### #### For some data replication techniques, this involves the use of #### #### a data receiver package running on the recovery cluster. #### #### In the event of a major failure on the primary cluster, the #### #### user issues the cmrecovercl(1m) command to halt any data #### #### receiver packages and start up all the recovery packages #### #### that exist on the recovery cluster. #### #### #### #### Enter the name of each package recovery group together with #### #### the fully qualified names of the primary and recovery #### #### packages. If appropriate, enter the fully qualified name #### #### of a data receiver package. Note that the data receiver #### #### package must be on the same cluster as the recovery package. #### #### #### #### The primary package name includes the primary cluster name #### #### followed by a slash ("/") followed by the package name on #### #### the primary cluster. The recovery package name includes #### #### the recovery cluster name, followed by a slash ("/") #### #### followed by the package name on the recovery cluster. #### #### The data receiver package name includes the recovery cluster #### #### name, followed by a slash ("/") followed by the name of #### #### the data receiver package on the recovery cluster. #### #### #### #### Up to 29 recovery groups can be entered. #### #### #### #### Example: #### #### #### #### RECOVERY_GROUP_NAME nfsgroup #### #### PRIMARY_PACKAGE westcoast/nfspkg #### #### DATA_SENDER_PACKAGE westcoast/nfssenderpkg #### #### RECOVERY_PACKAGE eastcoast/nfsbackuppkg #### #### DATA_RECEIVER_PACKAGE eastcoast/nfsreplicapkg #### #### #### #### RECOVERY_GROUP_NAME hpgroup #### #### PRIMARY_PACKAGE westcoast/hppkg #### #### DATA_SENDER_PACKAGE westcoast/hpsenderpkg #### #### RECOVERY_PACKAGE eastcoast/hpbackuppkg #### #### DATA_RECEIVER_PACKAGE eastcoast/hpreplicapkg #### #### #### |
 |
Editing
Section 3—Monitoring DefinitionsFinally, enter monitoring definitions that define cluster
events and set times at which alert and alarm notifications are
to be sent out. Define notifications for all cluster events—Unreachable,
Down, Up, and Error. Although it is impossible to make specific recommendations
for every Continentalclusters environment, here are a few general
guidelines about notifications. Specify the cluster event by using
the CLUSTER_EVENT keyword followed by the name of the cluster, a
slash (“/”) and the name of the status—Unreachable,
Down, Up, or Error. Example: CLUSTER_EVENT LAcluster/UNREACHABLE Define a CLUSTER_ALERT at appropriate times following the appearance
of the event. Specify the elapsed time and include a NOTIFICATION message that provides useful information about
the event. Create as many alerts as needed, and send as many notifications
as needed to different destinations (see the comments in the file
excerpt below for a list of destination types). Note that the message
text in the notification must be on a separate line in the file. If the event is for a cluster
in an Unreachable condition, define a CLUSTER_ALARM at appropriate times. Specify the elapsed time
since the appearance of the event (greater than the time used for
the last CLUSTER_ALERT), and include a NOTIFICATION message that indicates what action should be taken.
Create as many alarms as needed, and send as many notifications
as needed to different destinations (see the comments in the file
excerpt below for a list of destination types). If using a monitor on a cluster
containing no recovery packages, define alerts for the monitoring
of Up, Down, Unreachable, and Error states on the recovery cluster.
It is not necessary to define alarms.
A printout of Section 3 of the Continentalclusters ASCII configuration file
follows.  |
############################################################################### #### #### #### Section 3. Monitoring Definitions #### #### #### #### This section of the file contains monitoring definitions. #### #### Well planned monitoring definitions will help in making the #### #### decision whether or not to issue the cmrecovercl(1m) command. #### #### Each monitoring definition specifies a cluster event along with #### #### the messages that should be sent to system administrators #### #### or other IT staff. All messages are appended to the default log #### #### /var/opt/resmon/log/cc/eventlog as well as to the destination you #### #### specify below. #### #### #### #### A cluster event takes place when a monitor that is located on #### #### one cluster detects a significant change in the condition #### #### of another cluster. The monitored cluster conditions are: #### #### #### #### UNREACHABLE - the cluster is unreachable. This will #### #### occur when the communication link to the #### #### cluster has gone down, as in a WAN failure, #### #### or when the all nodes in the cluster have #### #### failed. #### #### #### #### DOWN - the cluster is down but nodes are responding. #### #### This will occur when the cluster is halted, #### #### but some or all of the member nodes are booted #### #### and communicating with the monitoring cluster. #### #### #### #### UP - the cluster is up. #### #### #### #### ERROR - there is a mismatch of cluster versions or #### #### a security error. #### #### #### #### A change from one of these conditions to another one is a #### #### cluster event. You can define alert or alarm states based on the #### #### length of time since the cluster event was observed. Some events #### #### are noteworthy at the time they occur, and some are noteworthy #### #### when they persist over time. Setting the elapsed time to zero #### #### results in a message being sent as soon as the event takes place. #### #### Setting the elaspsed time to 5 minutes results in a message #### #### being sent when the condition has persisted for 5 minutes. #### #### #### #### An alert is intended as informational only. Alerts may be sent #### #### for any type of cluster condition. For an alert, a notification #### #### is sent to a system administrator or other destination. Alerts #### #### are not intended to indicate the need for recovery. The #### #### cmrecovercl(1m) command is disabled. #### #### #### #### An alarm is an indication that a condition exists that may #### #### require recovery. For an alarm, a notification is sent, and #### #### in addition, the cmrecovercl(1m) command is enabled for immediate #### #### execution, allowing the administrator to carry out cluster #### #### recovery. An alarm can only be defined for an UNREACHABLE or #### #### DOWN condition in the monitored cluster. #### #### #### #### A notification defines a message that is appended to the #### #### log file /var/opt/resmon/log/cc/eventlog and sent to other #### #### specified destinations, including email addresses, SNMP traps, #### #### the system console, or the syslog file. The message string in #### #### a notification can be no more than 170 characters. Enter #### #### notifications in one of the following forms: #### #### #### #### NOTIFICATION CONSOLE #### #### <message> #### #### Message written to the console. #### #### #### #### NOTIFICATION EMAIL <address> #### #### <message> #### #### Message emailed to a fully #### #### qualified email address. #### #### #### #### NOTIFICATION OPC <level> #### #### <message> #### #### The <message> is sent to #### #### OpenView IT/Operations). #### #### The value of <level> may be 8 (normal), #### #### 16 (warning), 64 (minor), 128 (major), #### #### 32 (critical). #### #### #### #### NOTIFICATION SNMP <level> #### #### <message> #### #### The <message> is sent as an SNMP trap. #### #### The value of <level> may be 1 (normal), #### #### 2 (warning), 3 (minor), 4 (major), #### #### 5 (critical). #### #### #### #### NOTIFICATION SYSLOG #### #### <message> #### #### A notice of the event is appended to the #### #### syslog file. #### #### #### #### NOTIFICATION TCP <nodename>:<portnumber> #### #### <message> #### #### Message is sent to a TCP port on the #### #### specified node. #### #### #### #### NOTIFICATION TEXTLOG <pathname> #### #### <message> #### #### A notice of the event is written to a user- #### #### specified log file. <pathname> must be a full #### #### path for the user-specified file.The user #### |
 |
 |
#### specified file must be under /var/opt/resmon/log directory. #### #### #### #### NOTIFICATION UDP <nodename>:<portnumber> #### #### <message> #### #### Message is sent to a UDP port on the #### #### specified node. #### #### #### #### For the cluster event, enter a cluster name followed by #### #### a slash ("/") and a cluster condition (UP, DOWN, UNREACHABLE, ######## ERROR) that may be detected by a monitor program. ######## #### #### Each cluster event must be paired with a monitoring cluster. #### #### Include the name of the cluster on which the monitoring #### #### will take place. Events can be monitored from either the #### #### primary cluster or the recovery cluster. #### #### #### #### Alerts, alarms, and notifications have the following syntax. #### #### #### #### CLUSTER_ALERT <min> MINUTES <sec> SECONDS #### #### Delay before the software issues #### #### an alert notification about the #### #### cluster event. #### #### #### #### CLUSTER_ALARM <min> MINUTES <sec> SECONDS #### #### Delay before the software issues #### #### an alarm notification about the #### #### cluster event and enables the cmrecovercl(1m) #### #### command for immediate execution. #### #### #### #### NOTIFICATION <type> #### #### <message> #### #### A string value which is sent from the #### #### monitoring cluster for a given event #### #### to a specified destination. The <message>, #### #### which can be no more than 170 characters, #### #### is also appended to the #### #### /var/opt/resmon/log/cc/eventlog #### #### file on the monitoring node in the cluster #### #### where the event was detected. #### #### #### #### Example: #### #### #### #### CLUSTER_EVENT westcoast/UNREACHABLE #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 5 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast status unknown for 5 min. Call secondary site." ######## NOTIFICATION EMAIL admin@secondary.site #### #### "Call primary admin. (555) 555-6666." ######## ######## CLUSTER_ALERT 10 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast status unknown for 10 min.Call secondary site." ######## NOTIFICATION EMAIL admin@secondary.site #### #### "Call primary admin. (555) 555-6666." ######## NOTIFICATION CONSOLE #### #### "Cluster ALERT: westcoast not responding." ######## #### #### CLUSTER_ALARM 15 MINUTES #### #### NOTIFICATION EMAIL admin@primary.site #### #### "westcoast status unknown for 15 min. Takeover advised." ######## NOTIFICATION EMAIL admin@secondary.site #### #### "westcoast still not responding. Use cmrecovercl command." #### #### NOTIFICATION CONSOLE #### #### "Cluster ALARM: Issue cmrecovercl command to take over "westcoast." #### #### #### #### CLUSTER_EVENT westcoast/UP #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Cluster westcoast is up." ######## #### #### CLUSTER_EVENT westcoast/DOWN #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Cluster westcoast is down." ######## #### #### CLUSTER_EVENT westcoast/ERROR #### #### MONITORING_CLUSTER eastcoast #### #### CLUSTER_ALERT 0 MINUTES #### #### NOTIFICATION EMAIL admin@secondary.site #### #### "Error in monitoring cluster westcoast." ######## ####CLUSTER_EVENT <cluster_name>/UNREACHABLE |
 |
 |
MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALARM NOTIFICATION NOTIFICATIONCLUSTER_EVENT <cluster_name>/DOWN MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALERT NOTIFICATION NOTIFICATION CLUSTER_ALARM NOTIFICATION NOTIFICATIONCLUSTER_EVENT <cluster_name>/UP MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATIONCLUSTER_EVENT <cluster_name>/ERROR MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION |
The TEXTLOG notification file should be placed under the /var/opt/resmon/log directory. If any other directory is specified,
an error is reported by the cmapplyconcl and cmcheckconcl commands. If you specify any other location for logging, the following
error message appears: The target after textlog “ ” is not valid. |
Please specify a file under /var/opt/resmon/log directory |
If you upgraded Continentalclusters but are still using the
old configuration file, the textlog location is still specified
as /var/adm/cmconcl. As a result, the following error message appears: The file path “s” specified for textlog is invalid. |
The destination file must be under /var/opt/resmon/log directory. Please change the path and restart the ccmon package.  |  |  |  |  | IMPORTANT: For TEXTLOG notification, the destination log file
must be in the /var/opt/resmon/log directory. If the destination file is not available
in this directory, Continentalclusters will not work properly. |  |  |  |  |
Checking
and Applying the Continentalclusters Configuration |  |
After editing the configuration file on any of the participating
clusters in the Continentalcluster, halt any monitor packages that
are running, then use the following steps to apply the configuration
to all nodes in the continental cluster. Verify the content of the file. # cmcheckconcl -v -C cmconcl.config This command will verify that all parameters are within range,
all fields are filled out, and the entries (such as NODE_NAME) are valid. Distribute the Continentalclusters configuration
information to all nodes in the continental cluster. # cmapplyconcl -v -C cmconcl.config Configuration data is copied to all nodes and in all the participating clusters.
This data includes a set of managed object files that are copied
to the /ec/cmconcl/instances directory on every node in all clusters. Be sure to make a backup copy of the configuration ascii file and save it on the other cluster after it is applied.
 |  |  |  |  | NOTE: If any problems occur during the execution of cmapplyconcl, repeat the command as often as necessary. Issuing the
command will delete the existing Continentalclusters configuration
and apply the new one. |  |  |  |  |
When configuration is finished, your systems should have sets
of files similar to those shown in Figure 2-9 “ContinentalClusters Configuration
Files”. Starting
the ContinentalClusters Monitor Package |  |
Starting the monitoring package enables all ContinentalClusters monitoring
functionality. Before doing this, ensure that the primary packages
selected to be protected are running normally and that data sender
and receiver packages, if they are being used for logical data replication,
are working properly. If using physical data replication, make sure that it is operational. On each monitoring cluster start the monitor package. # cmmodpkg -e ccmonpkg After the monitor package is started, a log file /var/adm/cmconcl/sentryd.log will be created on the node where the package is running
to record the Continentalclusters monitoring activities. It is recommended
that this log file be archived or cleaned up periodically. Validating
the Configuration |  |
The following table shows the status of Continentalclusters
packages in a recovery pair when each cluster is running normally
and no recovery has taken place. Table 2-5 Status of Continentalclusters Packages Before Recovery | Primary Cluster | Recovery Cluster |
|---|
Data Replication Method | Primary Package | Data Sender Package | Optional Monitor Package | Recovery Package | Data Receiver Package | Required Monitor Package |
|---|
Physical— Symmetrix | Running | Not used | Running (optional) | Halted | Not used | Running (required) | Physical— XP Series | Running | Not used | Running (optional) | Halted | Not used | Running (required) | Physical—EVA Series | Running | Not used | Running (optional) | Halted | Not used | Running (required) | Logical— Oracle Standby Database | Running | Not used | Running (optional) | Halted | Running | Running (required) |
Use the following steps to ensure the components are functioning correctly: Make sure all daemons are running. # ps -ef | grep cmcl Two important Continentalclusters daemons are cmclsentryd and cmclrmond. Check the cluster configuration on each cluster
using the cmviewcl -v command. Ensure
that each primary package is running correctly. Ensure that the data sender
packages (if any are used for logical data replication) are running
correctly. Ensure that the data receiver
packages (if any are used for logical data replication) are running
correctly. Ensure that the continental
cluster monitor package is running correctly on each monitoring
cluster.
On all nodes, use the tail -f /adm/syslog/syslog.log command to check the end of the SYSLOG file for errors. On nodes where packages are running, check all package
log files for errors, including application packages and the monitor
package. Use the following command to verify the correct
operation of the Continentalclusters daemon: # /opt/cmom/tools/bin/cmreadlog -f \/var/adm/cmconcl/sentryd.log Make sure the ContinentalClusters monitor packages (default
name ccmonpkg) on each cluster fails over properly if a node fails. Change each cluster’s state to test that
the monitor running on the monitoring cluster will detect the change
in status and send notification. View the status of the Continentalclusters primary
and recovery clusters, including configured event data. # cmviewconcl -v
 |  |  |  |  | CAUTION: Never issue the cmrunpkg command for a recovery package when ContinentalClusters is enabled,
because there is no guaranteed way of preventing a package that
is running on one cluster from running on the other cluster if
the package is started using this command. The potential for data
corruption is great. |  |  |  |  |
Chapters 3, 4 and 5 contain additional suggestions on testing
the data replication and package configuration. Documenting
the Recovery Procedure |  |
Once everything is configured and the ContinentalClusters monitor is running,
it is necessary to define your recovery procedure and train the administrators
and operators at both sites. The checklist in Figure 2-10 “Recovery
Checklist” is an example of to document the recovery procedure. Reviewing
the Recovery Procedure |  |
Using the checklist described in the previous section, step
through the recovery procedure to make sure that all necessary steps
are included. If possible, create simulated failures to test the
alert and alarm scenarios coded in the Continentalclusters configuration
file.
|