| United States-English |
|
|
|
![]() |
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters: > Chapter 1 Designing a Metropolitan ClusterDesigning a Disaster Tolerant Architecture for use with Metrocluster Products |
|
HP Metrocluster is a disaster tolerance solution for HP 9000 and HP Integrity systems running HP-UX 11i operating systems. This Metrocluster solution protects application data and ensures service availability in the event of disaster or failures that affect a whole site or a data center. HP Metrocluster uses Serviceguard clustering technology to form a single cluster of systems which are located apart from each other at different data centers over metropolitan distances. Metrocluster integrates with HP StorageWorks Continuous Access or EMC Symmetrix Remote Data Facility (SRDF) to continuously replicate application data across data centers. In case of a disaster or failure that causes application outage, Metrocluster prepares the replicated data of the application and automatically starts the application on the remote systems. HP Metrocluster supports the following configurations:
Specifically for disaster tolerance, Serviceguard clusters or data centers can also be configured on different subnets. Such configurations provide improved scalability as operators can configure more number of nodes with more IP addresses. Following are the guidelines that must be followed to configure a Serviceguard cluster across network subnets:
For more information on configuring cross-subnet clusters, see the Managing Serviceguard manual available at http://www.docs.hp.com. Following are the disaster tolerant architecture requirements:
Metrocluster also defines a Site Aware Disaster Tolerant Architecture (SADTA) for complex workloads such as Oracle RAC database, including Oracle Database 10gR2 RAC and Oracle Database 11gR1 RAC, and SAP. This solution uses an additional software feature called the Site Controller Package to provide disaster tolerance for workload databases. For more information on SADTA, see “Overview of Site Aware Disaster Tolerant Architecture”. A single data center architecture is supported, but it is not a true disaster tolerant architecture. If the entire data center fails, there will be no automated failover. This architecture is only valid for protecting data through data replication, and for protecting against multiple node failures. This is the recommended and supported disaster tolerant architecture for use with Metropolitan cluster. This architecture consists of two main data centers with an equal number of nodes and a third location with one or more arbitrator nodes or a quorum server node. Figure 1-1. A disk array can be the main disk array for one set of packages and the remote disk array for another. In Figure 1-1, the XP disk array in data center A is the main or primary disk array for packages A and B, and the remote or secondary disk array for packages C and D in data center B. For packages A and B, data is written to PVOLs on the array in Data Center A and replicated to SVOLs on the array in Data Center B. Likewise the XP disk array in Data Center B is the primary or main disk array for packages C and D, and the secondary or remote for packages A and B. For packages C and D, data is written to PVOLs on the disk array in Data Center B and replicated to SVOLs in Data Center A. Arbitrators provide functionality like that of the cluster lock disk, and act as tie-breakers for a cluster quorum in case all of the nodes in one data center go down at the same time. Cluster lock devices are not supported because cluster locks cannot be maintained across the replication link, such as Continuous Access or SRDF. Arbitrators are fully functioning systems that are members of the cluster, and are not usually physically connected to the disk arrays. A Quorum Server is an alternative form of cluster arbitration that uses a server program to determine cluster membership rather than a cluster lock disk or a Serviceguard Arbitration Node. Table 1-1 lists the allowable number of nodes at each main data center and the third location, up to a 16-node maximum cluster size. Table 1-1 Supported System and Data Center Combinations
* Configurations with two arbitrators are preferred because they provide a greater degree of availability, especially in cases when a node is down due to a failure or planned maintenance. It is highly recommended that two arbitrators be configured in Data Center C to allow for planned downtime in Data Centers A and B. The following is a list of recommended arbitration methods for Metrocluster solutions in order of preference:
For more information on Quorum Server, refer to the Serviceguard Quorum Server Release Notes for HP-UX. Although you can use one arbitrator, having two arbitrators provides greater flexibility in taking systems down for planned outages as well as providing better protection against multiple points of failure. Using two arbitrators:
If you use a single arbitrator system, special procedures must be followed during planned downtime to remain protected. Systems must be taken down in pairs, one from each of the data centers, so that the Serviceguard quorum is maintained after a node failure. If the arbitrator itself must be taken down, disaster recovery capability is at risk if one of the other systems fails. Arbitrator systems can be used to perform important and useful work such as: Each disk array must be configured with redundant links for data replication. To prevent a single point of failure (SPOF), there must be at least two physical boards in each disk array for the data replication links. Each board usually has multiple ports. However, a redundant data replication link must be connected to a port on a different physical board from the board that has the primary data replication link. For Continuous Access XP, when using bi-directional configurations, where data center A backs up data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which to allow failback. When a cluster initially forms, all systems must be available to form the cluster (100% Quorum requirement). A quorum is dynamic and is recomputed after each system failure. For instance, if you start out with an 8-node cluster and two systems fail, that leaves 6 out 8 surviving nodes, or a 75% quorum. The cluster size is reset to 6 nodes. If two more nodes fail, leaving 4 out of 6, quorum is 67%. Each time a cluster forms, there must be more than 50% quorum to reform the cluster. With Serviceguard a cluster lock disk or Quorum Server is used as the tie-breaker when quorum is exactly 50%. However, with a Metrocluster configuration, a Quorum Server is supported and a cluster lock disk is not supported. Therefore, a quorum of 50% will require access to a Quorum Server, otherwise all nodes will halt. Taking a node off-line for planned maintenance is treated the same as a node failure in these scenarios. Study these scenarios to make sure you do not put your cluster at risk during planned maintenance. The scenarios in Table 1-2, based on Figure 1-2, illustrate possible results if one or more nodes fail in a configuration with a single arbitrator. Table 1-2 Node Failure Scenarios with One Arbitrator
* Cluster can be manually started with the remaining node. With a single arbitrator node, the cluster is at risk each time a node fails or comes down for planned maintenance. Having two arbitrator nodes adds extra protection during node failures and allows you to do planned maintenance on arbitrator nodes without losing the cluster should a disaster occur. The scenarios in Table 1-3 illustrate possible results if a data center or one or more nodes fail in a configuration with two arbitrators. Note that 3 of the 4 scenarios that caused a cluster halt with a single arbitrator, do not cause a cluster halt with two arbitrators. Table 1-3 Node Failure Scenarios with Two Arbitrators
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||