Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Designing Disaster Tolerant High Availability Clusters: > Chapter 4 Building a Metropolitan Cluster Using MetroCluster/SRDF

Designing a Disaster Tolerant Architecture for use with MetroCluster/SRDF

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

MetroCluster with Symmetrix SRDF is designed for use metropolitan cluster environment within the 100 km loop limit of the FDDI network.

All nodes must be members of a single MC/ServiceGuard cluster. Two configurations are supported:

Following are the disaster tolerant architecture requirements:

  • In the disaster tolerant cluster architecture, it is expected that each data center is self-contained such that the loss of one data center does not cause the entire cluster to fail. It is important that all Single Points of Failure (SPOF) be eliminated so that surviving systems continue to run in the event that one or more systems fail.

  • It is also expected that the networks between the data centers are redundant and routed in such a way that the loss of any one data center does not cause the network between surviving data centers to fail.

  • Exclusive Volume Group activation must be used for all VGs associated with packages that use the Symmetrix. The design of the MetroCluster with Symmetrix SRDF script assumes that only one system in the cluster will have a VG activated at any time.

Single Data Center

A single data center architecture is supported, but it is not a true disaster tolerant architecture. If the entire data center fails, there will be no automated failover. This architecture is only valid for protecting data through data replication, and for protecting against multiple node failures.

Two Data Centers and Third Location with Arbitrator(s)

This is the recommended and supported disaster tolerant architecture for use with MetroCluster with Symmetrix SRDF. This architecture consists of two main data centers with an equal number of nodes and a third location with one or more arbitrator nodes; see Figure 4-1 “Two Data Centers and Third Location with Arbitrators”.

Figure 4-1 Two Data Centers and Third Location with Arbitrators

Two Data Centers and Third Location with Arbitrators

The local EMC Symmetrix disk array is called the R1 disk for all nodes and packages in a given data center. The remote EMC Symmetrix disk array, where the data is replicated is called the R2 disk. In Figure 4-1 “Two Data Centers and Third Location with Arbitrators” the EMC Symmetrix disk array in data center A is the R1 disk for packages A and B, and the R2 disk for packages C and D in data center B. Likewise the EMC Symmetrix disk array in data center B is the R1 for packages C and D, and the R2 for packages A and B.

Arbitrators provide similar functionality to the cluster lock disk, and act as tie-breakers for a cluster quorum in case all of the nodes in one data center go down at the same time. Cluster lock devices are not supported in MetroCluster configuration because cluster locks cannot be maintained across the SRDF link.

Arbitrators are fully-functioning systems that are members of the cluster and are not usually physically connected to the Symmetrix units. Figure 4-1 “Two Data Centers and Third Location with Arbitrators” lists the allowable number of nodes at each main data center and the third location, assuming a 16-node maximum cluster size. Figure 4-1 “Two Data Centers and Third Location with Arbitrators” shows a two data center and third location configuration with two nodes at each site, and two arbitrator nodes

Table 4-2 Possible Number of Nodes in a Three Data Center Configuration

Primary Data Center A
(with Symmetrix)

Primary Data Center B
(with Symmetrix)

Arbitrator Third Location
(No Symmetrix)

1

1

1

2

2

1

2

2

2*

3

3

1

2

12

3

3

2*

4

41

4

42*

5

51

5

52*

6

61

6

62*

7

71

7

72*

 

* Configurations with two arbitrators are preferred because they provide a greater degree of availability, especially in cases when a node is down due to a failure or planned maintenance.

NOTE: In the campus or metropolitan environment, the same number of systems must be present in each of the two data centers (Data Center A and Data Center B) whose systems are connected to the Symmetrix units. There must be either one or two arbitrators in third location.

Arbitrator Node Configuration Rules

Although you can use one arbitrator, having two arbitrators provides greater flexibility in taking systems down for planned outages as well as providing better protection against multiple points of failure:

  • Provides local failover capability to applications running on the arbitrator.

  • Protects against more MPOF (Multiple Points of Failure).

  • Provides for planned downtime of a single system anywhere in the cluster.

If you use a single arbitrator system, special procedures must be followed during planned downtime to remain protected. Systems must be taken down in pairs, one from each of the data centers, so that the MC/ServiceGuard quorum is maintained after a node failure. If the arbitrator itself must be taken down, disaster recovery capability is at risk if one of the other systems fails.

Arbitrator systems can be used to perform important and useful work such as:

  • Running mission-critical applications not protected by disaster recovery

  • IT/Operations or NetworkNodeManager

  • Backup

  • Application servers

Calculating a Cluster Quorum

When a cluster initially forms, all systems must be available to form the cluster (100% Quorum requirement).

A quorum is dynamic and is recomputed after each system failure. For instance, if you start out with an 8-node cluster and two systems fail, that leaves 6 out 8 surviving nodes, or a 75% quorum. The cluster size is reset to 6 nodes. If two more nodes fail, leaving 4 out of 6, quorum is 67%.

Each time a cluster forms, there must be more than 50% quorum to reform the cluster. Cluster lock disks are normally used as the tie-breaker when quorum is exactly 50%. However, cluster lock disks are not supported with MetroCluster with Symmetrix SRDF. Therefore, a quorum of 50% or less will cause the remaining nodes to halt.

Example Failover Scenarios with One Arbitrator

Taking a node off-line for planned maintenance is treated the same as a node failure in these scenarios. Study these scenarios to make sure you do not put your cluster at risk during planned maintenance.

Figure 4-2 Failover Scenario with a Single Arbitrator

Failover Scenario with a Single Arbitrator

The scenarios in Table 4-3 “Node Failure Scenarios with One Arbitrator” are based on Figure 4-2 “Failover Scenario with a Single Arbitrator” and illustrate possible results if one or more nodes fail in a configuration with a single arbitrator.

Table 4-3 Node Failure Scenarios with One Arbitrator

Node FailureQuorumResult
arbitrator 14 of 5 (80%)nothing
node 14 of 5 (80%)pkg A switches
node 1, then node 23 of 4 (75%)pkg A and B switch
node 1, 2, then arbitrator 12 of 3 (67%)nothing
nodes 1, 2, arbitrator 1, then node 31 of 2 (50%)cluster halts*
arbitrator 1, then node 13 of 4 (75%)pkg A switches

 

* Cluster can be manually started with the remaining node.

Table 4-4 “Data Center Failure Scenarios with One Arbitrator” illustrates possible results if a data center fails in a configuration with a single arbitrator.

Table 4-4 Data Center Failure Scenarios with One Arbitrator

Node FailureQuorumResult
data center A (nodes 1 and 2)3 of 5 (60%)pkg A and B switch to data center B
data center A, then arbitrator 12 of 3 (67%)pkg A and B switch, then nothing
data center A and arbitrator 12 of 5 (40%)cluster halts*
data center A, then arbitrator 1, then node 31 of 2 (50%)cluster halts*
arbitrator 1, then data center A2 of 4 (50%)cluster halts*
node 3, then data center A2 of 4 (50%)cluster halts*
data center B3 of 5 (60%)pkg C and D switch to data center A
third location4 of 5 (80%)nothing

 

* Cluster can be manually started with the remaining node.

With a single arbitrator node, the cluster is at risk each time a node fails or you take one node down for planned maintenance.

Example Failover Scenarios with Two Arbitrators

Having two arbitrator nodes adds extra protection during nodes failures and allows you to do planned maintenance on arbitrator nodes without losing the cluster should a disaster occur.

Figure 4-3 Failover Scenario with Two Arbitrators

Failover Scenario with Two Arbitrators

The scenarios in Table 4-5 “Data Center Failure Scenarios with Two Arbitrators” illustrate possible results if a data center or one or more nodes fail in a configuration with two arbitrators. Note that 3 of the 4 scenarios that caused a cluster halt with a single arbitrator, do not cause a cluster halt with two arbitrators.

Table 4-5 Data Center Failure Scenarios with Two Arbitrators

Node FailureQuorumResult
data center A (nodes 1 and 2)4 of 6 (67%)pkg A and B switch to data center B
data center A, then arbitrator 13 of 4 (75%)pkg A and B switch, then nothing
data center A and arbitrator 13 of 6 (50%)cluster halts*
data center A, then arbitrator 1, then node 32 of 3 (67%)pkg A, B, and C switch
arbitrator 1, then data center A3 of 5 (60%)pkg A and B switch to data center B
node 3, then data center A3 of 5 (60%)pkg A and B switch to data center B
data center B4 of 6 (67%)pkg C and D switch to data center A
third location4 of 6 (67%)nothing

 

* Cluster can be manually started with the remaining node.

Disaster Tolerant Checklist

Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for a two main data center and third location configuration.

Figure 4-4  Disaster Tolerant Checklist

Disaster Tolerant Checklist

Cluster Configuration Worksheet

Use this cluster configuration worksheet either in place of, or in addition to the worksheet provided in the Managing MC/ServiceGuard manual. If you have already completed an MC/ServiceGuard cluster configuration worksheet, you only need to complete the first part of this worksheet.

Figure 4-5  Cluster Configuration Worksheet

Cluster Configuration Worksheet

Package Configuration Worksheet

Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the Managing MC/ServiceGuard manual. If you have already completed an MC/ServiceGuard cluster configuration worksheet, you only need to complete the first part of this worksheet.

Figure 4-6  Package Configuration Worksheet

Package Configuration Worksheet
NOTE: It is recommended to use the same Symmetric Device Group Name on all nodes.

Figure 4-7  Package Control Script Worksheet

Package Control Script Worksheet
Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.