Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant High Availability Clusters: > Chapter 3 Building a Metropolitan Cluster Using MetroCluster/CA

Maintaining a Cluster that uses MetroCluster/CA

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

While the cluster is running, manual changes of state for devices on the XP Series disk array can cause the package to halt due to unexpected conditions or can cause the package to not start up after a failover. In general, it is recommended that no manual changes of state be performed while the package and the cluster are running.

NOTE: Manual changes can be made when they are required to bring the device group into a "protected" state. For example, if a package starts up with data replication suspended, a user can perform a pairresync command to re-establish data replication while the package is still running.

Viewing the Progress of Copy Operations

While a copy is in progress between XP systems (that is, the volumes are in a COPY state), you can see the progress of the copy by viewing the % column in the output of the pairdisplay command:

# pairdisplay -g pkgB -fc -CLI

Group   PairVol L/R   Port# TID LU  Seq# LDEV# P/S Status Fence    %  P-LDEV# M
pkgB pkgD-disk0 L CL1-C 0 3 35422 463 P-VOL COPY NEVER 79 460 -
pkgB pkgD-disk0 R CL1-F 0 3 35663 3 S-VOL COPY NEVER - 0 -

This display shows that 79% of a current copy operation has completed. Synchronous fence levels (NEVER and DATA) show 100% in this column when the volumes are in a PAIR state.

Viewing Side File Size

If you are using asynchronous data replication, you can see the current size of the side file when the volumes are in a PAIR state by using the pairdisplay command. The following output, obtained during normal cluster operation, shows the percentage of the side file that is full:

# pairdisplay -g pkgB -fc -CLI

Group   PairVol L/R   Port# TID LU  Seq# LDEV# P/S Status Fence    %  P-LDEV# M
pkgB pkgD-disk0 L CL1-C 0 3 35422 463 P-VOL PAIR ASYNC 35 3 -
pkgB pkgD-disk0 R CL1-F 0 3 35663 3 S-VOL PAIR ASYNC 0 463 -

This output shows that 35% of the side file is full.

When volumes are in a COPY state, the % column shows the progress of the copying between the XP frames, until it reaches 100%, at which point the display reverts to showing the side file usage in the PAIR state.

Normal Maintenance

There might be situations when the package has to be taken down for maintenance purposes without having the package move to another node. The following procedure is recommended for normal maintenance of the MetroCluster/CA:

  1. Stop the package with the appropriate MC/ServiceGuard command.

    # cmhaltpkg pkgname

  2. Distribute the MetroCluster with Continuous Access XP configuration changes.

    # cmapplyconf -P pkgname.config

  3. Start the package with the appropriate MC/ServiceGuard command:

    # cmmodpkg -e pkgname

Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. See “Example Failover Scenarios with Two Arbitrators”.

Resynchronizing

After certain failures, data is no longer remotely protected. In order to restore disaster-tolerant data protection after repairing or recovering from the failure, you must manually run the command pairresync. This command must successfully complete for disaster-tolerant data protection to be restored.

Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection:

  • Failure of all CA links without restart of the application

  • Failure of all CA links with Fence Level "DATA" with restart of the application on a primary host

  • Failure of the entire secondary Data Center for a given application package

  • Failure of the secondary XP Series disk array for a given application package while the application is running on a primary host

Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection. Full resynchronization is automatically initiated for these failures by moving the application package back to its primary host after repairing the failure:

  • Failure of the entire primary data center for a given application package

  • Failure of all of the primary hosts for a given application package

  • Failure of the primary XP Series disk array for a given application package

  • Failure of all CA links with restart of the application on a secondary host

Pairs must be manually recreated if both the primary and secondary XP Series disk array are in SMPL (simplex) state. Make sure you periodically review the files syslog.log and /etc/cmcluster/pkgname/pkgname.log for messages, warnings and recommended actions. You should particularly review these files after system, data center and/or application failures.

Full resynchronization must be manually initiated after repairing the following failures:

  • Failure of the secondary XP Series disk array for a given application package followed by application startup on a primary host

  • Failure of all CA links with Fence Level NEVER and ASYNC with restart of the application on a primary host

Using the pairresync Command

The pairresync command can be used with special options after a failover in which the recovery site has started the application and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact. After the CA link is fixed, you use the pairresync command in one of the following two ways depending on which site you are on:

  • pairresync -swapp—from the primary site.

  • pairresync -swaps—from the failover site.

These options take advantage of the fact that the recovery site maintains a bit-map of the modified data sectors on the recovery array. Either version of the command will swap the personalities of the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the personalities swapped, any data that has been written to the volume on the failover site (now PVOL) are then copied back to the SVOL now running on the primary site. During this time the package continues running on the failover site. After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. MetroCluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site.

NOTE: The preceding steps are automated provided the default value of 1 is being used for the auto variable AUTO_PSUEPSUS. Once the CA link failure has been fixed, the user only needs to halt the package on the recovery cluster and restart on the primary cluster. However, if you want to reduce the amount of application downtime, you should manually invoke pairresync before failback.

Failback

After resynchronization is complete, you can halt the package on the failover site, and restart it on the primary site. MetroCluster will then swap the personalities between the PVOL and the SVOL, returning PVOL status to the primary site.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.