 |
» |
|
|
 |
When the following procedures are completed, an adoptive node
will be able to access the data belonging to a package after it
fails over. Setting up the Hardware |  |
Ensure
that the XP Series disk arrays are correctly cabled using PV links
to each node in the cluster that will run packages accessing data
on the array. Configure the XP disk array for synchronous
or asynchronous operation. If you are using a fence level of ASYNC, you must configure a side file using
the Service Processor (SVP) attached to the XP system. Synchronous
operation does not require side file configuration on the SVP. Use the ioscan command to determine what devices on the XP disk array
have been configured as command devices. The device-specific information
in the rightmost column of the ioscan output will have the suffix -CM for these devices, for example, OPEN-3-CM.If there are no configured command devices on
the disk array, you must create two before proceeding. Each command
device must have alternate links (PV links). The first command device
is the primary command device. The second command device is a redundant command
device and is used only upon failure of the primary command device.
The command devices must be mapped to the various host interfaces
by using the SVP (disk array console) or a remote console. Primary (PVOL) and secondary (SVOL) volumes must
be correctly defined and assigned to the appropriate nodes in the
XP hardware configuration. Primary devices (PVOLs) must be locally protected
(RAID 1 or RAID 5). Secondary devices (SVOLs) must be locally protected (RAID 1 or RAID 5).
Setting Fence Levels |  |
All devices defined in a given device
group must be configured with the same fence level. A fence level
of DATA or NEVER results in synchronous data replication; a fence
level of ASYNC is used to enable asynchronous data replication. Fence level = DATA is recommended to ensure a consistent copy of
the data on all sides. If Fence level = DATA is not enabled, the data
may be inconsistent in the case of a rolling disaster—additional
failures taking place before the system has completely recovered
from a previous failure.Fence level = DATA is recommended to ensure that there is no possibility of
inconsistent data at the SVOL side in case of CA (ESCON) link failure.
Since only dedicated ESCON links are supported, the probability
of intermittent link failure is extremely low. Therefore, the probability
of inconsistent data at the remote (SVOL) side is extremely low.
However, inconsistent and therefore unusable data will result from
the following sequence of circumstances: Fence
level = DATA is not enabled. The application continues to modify data. Resynchronization from PVOL to SVOL starts, but
does not finish.
Although
the risk of this sequence of events taking place is extremely low,
if your business cannot afford even this quite small risk, then
you must enable Fence level = DATA to ensure that the data at the SVOL side are always
consistent. The disadvantage of enabling Fence level = DATA is that when the CA link fails, or if the entire
remote (SVOL) data center fails, all I/Os will be refused (to those
devices) until the CA link is restored, or manual intervention is
undertaken to split the PVOL from the SVOL. Applications may fail
or may continuously retry the I/Os (depending on the application)
if Fence level = DATA is enabled and the CA link fails. Fence level = ASYNC is recommended to improve performance in data replication
between the primary and the remote site. The XP disk array supports
asynchronous mode with guaranteed ordering. When the host does a
write I/O to the XP disk array, as soon as the data is written to cache,
the array sends a reply to the host. A copy of the data with a sequence
number is saved in an internal buffer, known as the side
file, for later transmission to the remote XP disk array.
When synchronous replication is used, the primary system cannot
complete a transaction until a message is received acknowledging
that data has been written to the remote site. With asynchronous
replication, the transaction is completed once the data is written
to the side file on the primary system, which allows I/O activity
to continue even if the ESCON link is temporarily unavailable. The side file is 30% to 70% of cache (default 50%) that is
assigned through the XP system's Service Processor (SVP).
The high water mark (HWM) is 30% of the cache;
if the quantity of data in the side file exceeds this value, the
write I/O will be delayed to the side file starting from .5 seconds
and increasing to 4 seconds maximum with every 5% increase over
HWM in 500 ms increments. If the HWM continues to grow, it will eventually hit the side
file threshold (30 to 70% of cache). When this limit
has been reached, the XP on the primary site cannot write to the
XP on the secondary site until there is enough room in the side
file. Before continuing to write, the primary XP will wait until
there is enough room in the side file, and will keep trying until
it reaches its side file timeout value, which
is configured through the SVP. If timeout has been reached, then
the primary XP disk array will begin tracking data on its bitmap
which will be copied over to the secondary volume during resync. The side file operation is shown in Figure 3-8 “XP Series Disk Array Side File”.
 |  |  |  |  | NOTE: The side file must be configured using the XP
Service Processor (SVP). Refer to the XP Series documentation for
details. |  |  |  |  |
In asynchronous mode, when there is an ESCON link failure,
both the PVOL and SVOL sides change to a PSUE state. When the SVOL
side detects missing data blocks from the PVOL side, it will wait
for those data blocks from the PVOL side until it has reached the
configured ESCON link timeout value (set in the SVP). Once this
timeout value has been reached, then the SVOL side will change to
a PSUE state. The default ESCON link timeout value is 5 minutes
(300 seconds). An important property of asynchronous mode volumes is the consistency
group (CT group). A CT group is a grouping of LUNs
that need to be treated the same from the perspective of data consistency
(I/O ordering). A CT group can contain one or more device groups
in the Raid Manager configuration file. A consistency group ID (CTGID),
0 to 15 for the XP256 or 0 to 63 for the XP512, is assigned automatically
during pair creation. Limitations of Asynchronous ModeThe following are restrictions for an asynchronous CT group
in a Raid Manager configuration file: Asynchronous device groups cannot
be defined to extend across multiple XP Series disk arrays. If two or more device groups are included within
a CT group in the configuration file, then a pair operation occures
only at the granularity of the entire CT group. When making paired volumes, the Raid Manager registers
a CTGID to the XP Series disk array automatically at paircreate time, and the device group in the configuration file
is mapped to a CTGID. The maximum number of consistency groups per
XP256 is 16 (0 to 15), and per XP512 is 64 (0 to 63). Efforts to
create a CTGID with a higher number will be terminated with a return
value of EX_ENOCTG. MetroCluster/CA will support only one package device
group per consistency group. This means that in one metropolitan
cluster, there can be only 16 packages that can be configured to
use consistency groups on the XP256, or 32 packages that can be
configured to use consistency groups on the XP512. MC/ServiceGuard
supports only a maximum of 30 packages per cluster.
Other Considerations on Asynchronous ModeThe following are some additional considerations when using asynchronous
mode: When adding a new volume to
an existing device group, the new volume state is SMPL. The XP disk
array controller (DKC) is smart enough to do the paircreate only on the new volume. If the device group has mixed
volume states like PAIR and SMPL, the pairvolchk returns EX_ENQVOL, and horctakeover will fail. If you change the LDEV number associated with a
given target/LUN, you must restart all the Raid Manager instances
even though the Raid Manager configuration file is not modified. Any firmware update, cache expansion, or board change,
requires a restart of all Raid Manager instances. pairsplit for asynchronous mode may take a long time depending on
how long the synchronization takes. there is a potential for the link
to fail while pairsplit is in progress. If this happens, pairsplit will fail
with a return code of EX_EWSUSE. In most cases, MetroCluster/CA in asynchronous mode
will behave the same as when the fence level is set to NEVER in synchronous mode.
Installing the Necessary Software |  |
Before any configuration can begin, you need to perform the
following installation tasks on all nodes: Install Raid Manager XP, which allows
you to manage the XP series disk arrays from the node. Refer to
the installation instructions in the Raid Manager XP
User's Guide. Edit the /etc/services file, adding an entry for
the Raid Manager XP instance to be used with MetroCluster/CA in
the format horcm<instance-number> <port-number>/udp. For example: horcm0 11000/udp #Raid Manager instance 0
|
See the file /opt/cmcluster/toolkit/SGCA/Samples/services.example. Install MetroCluster with Continuous Access XP on
all nodes according to the instructions in the MetroCluster
with Continuous Access XP Release Notes.
Creating the Raid Manager Configuration |  |
The Raid Manager configuration file must be edited and customized
on each node that is attached to one of the XP Series disk arrays.
The file is named using the following convention: horcm<instance number>.conf All MetroCluster packages must use the same Raid Manager instance, and
must be configured in the same configuration file. In the examples
in this chapter, instance zero is assumed, which is configured in
file horcm0.conf. Here are the steps to follow for creating the configuration: Copy the default Raid Manager
configuration file to an instance-specific name: # cp /etc/horcm.conf /etc/horcm0.conf Create a minimum Raid Manager configuration file
by editing the following sections of the file created in the previous
step: - HORCM_MON
Enter the host-name of the system on which you are editing
and the TCP/IP port number specified for this Raid Manager instance
in the /etc/services file. - HORCM_CMD
Enter the primary and alternate link device file names
for both primary and redundant command devices (for a total of four
raw device file names).
 |  |  |  |  | WARNING! Make sure that the redundant command device is not on
the same physical device as the primary command
device. Also, make sure that the two command devices are on different
buses inside the XP Series disk array. |  |  |  |  |
Start the Raid Manager instance by using the command horcmstart.sh <instance-#> as in the following example: # horcmstart.sh 0 Export the environment variable that specifies the
Raid Manager instance to be used by the Raid Manager commands.
For example, with the POSIX shell, type: # export HORCMINST=0 Now, you can use Raid Manager commands to get further information from
the disk arrays. Verify the software revision of the Raid Manager
and the firmware revision of the XP Series disk array, use the command raidqry -l. The Raid Manager software must be at least revision
01.02.03 and the firmware must be at least revision C. Obtain a list of the available devices on the disk
arrays using the raidscan command. This command must be invoked separately for each
host interface connection to the disk array. For example, if there are
two Fibre Channel host adapters, you might use the following commands: # raidscan -p CL1-A # raidscan -p CL1-B  |  |  |  |  | NOTE: There must also be alternate links for each device,
and these alternate links must be on different buses inside the
XP Series disk array. These alternate links, for example, may be
CL2-E and CL2-F. |  |  |  |  |
Unless the devices have been previously paired either on this
or another host, the devices will show up as SMPL (simplex). Paired devices
will show up as PVOL (primary volume) or SVOL (secondary volume). Determine which devices will be used by the application
package. Define a device group that contains all of these devices.
The device group name (dev_group) is user-defined and must be the
same on each host in the MetroCluster that accesses the XP Series
disk array. It is recommended that you use a name that is easily
associated with the package. For example, a device group name of "db-payroll" is easily
associated with the database for the payroll application. A device
group name of "group1" would be more difficult
to easily relate to an application. The device group name MUST be
unique within the cluster. The device name (dev_name) is also user-defined and must be
the same on each host in the MetroCluster that accesses the XP Series disk
array. The device name (dev_name) must be unique
among all devices in the cluster. However, the TargetID and LU#
fields for each device name may be different on different hosts
in the cluster, to allow for different hardware I/O paths on different
hosts. Edit the following sections of the Raid Manager
configuration file that was created in a previous step: - HORCM_DEV
Include the devices and device group used by the application
package. Only one device group may be specified for all of
the devices that belong to a single application
package. - HORCM_INST
Supply the names of only those hosts that
are attached to the XP Series disk array that is remote from
the disk array directly attached to this host. For example, with
a MetroCluster of 6 nodes, 2 of which are Arbitrators, you would
specify only hosts 3 and 4 in the HORCM_INST section. Host 1 would have previously been specified
in the HORCM_MON section.
See the file horcm0.conf.<sys-name> in /opt/cmcluster/toolkit/SGCA/Samples/
for an example.
Restart the Raid Manager instance so that the new
information in the configuration file is read. Use the following
commands: # horcmshutdown.sh <instance-#> # horcmstart.sh <instance-#> Repeat these steps on each host that will run this
particular application package. If a host may run more than one
application package, you must incorporate device group and host
information for each of these packages. Note that the Raid Manager
configuration file must be different for each host, especially for
the HORCM_MON and HORCM_INST fields. The HORCM_MON section of the file is unique for each node in
all clusters that are attached to an XP Series disk array. Enter
the host name or IP address followed by the name of the Raid Manager instance
that is monitoring the MetroCluster packages on that node (horcm0
in the current example). If you have not already done so, use the paircreate command to create the device groups that are listed in
the Raid Manager configuration files. See the Raid Manager
User's Guide or view the man page for paircreate for more information. Example: # paircreate -g db_payroll -f data -vl -c15  |  |  |  |  | WARNING! Paired devices must be of compatible sizes and types. |  |  |  |  |
Sample Raid Manager Configuration File |  |
The following is an example of a Raid Manager configuration
file for one node (ftsys1).  |
# # horcm0.conf.ftsys1 # - This is an example Raid Manager configuration file for node ftsys1. # Note that this configuration file is for Raid Manager instance 0, # which can be determined by the "0" in the filename "horcm0.conf". # # Whenever this configuration file is changed, you must stop and restart the # instance of Raid Manager before the changes will be recognized. This can # be done using the following commands: # # horcmshutdown.sh <instance> # horcmstart.sh <instance> # # After restarting the Raid Manager instance, you should confirm that there # are no configuration errors reported by running the pairdisplay command # with the "-c" option. # # NOTE: The Raid Manager command device (RORCM_CMD) cannot be used for # data storage (it is reserved for private Raid Manager usage). #/************************ HORCM_MON *************************************/ # # The HORCM_MON parameter is used for monitoring and control of device groups # by the Raid Manager. # It is used to define the IP address, port number, and paired volume error # monitoring interval for the local host. # <ip_address> # Defines a network address used by the local host. This can be a host name # or an IP address. # <service> # Specifies the port name assigned to the Raid Manager communication path, # which is must also be defined in /etc/services. If a port number, rather # than a port name is specified, the port number will be used. # <poll_interval> # Specifies the interval used for monitoring the paired volumes. By # increasing this interval, the Raid Manager daemon load is reduced. # If this interval is set to -1, the paired volumes are not monitored. # <timeout> # Specifies the time-out period for communication with the Raid Manager # server. HORCM_MON #ip_address service poll_interval(10ms) timeout(10ms) ftsys1 horcm0 1000 3000 #/************************* HORCM_CMD *************************************/ # # The HORCM_CMD parameter is used to define the special files (raw device # file names) of the Raid Manager command devices used for the monitoring # and control of Raid Manager device groups. # Define the special device files corresponding to two or more command devices # in order to use the Raid Manager alternate command device feature. An # alternate command device must be configured, otherwise a failure of a # single command device could prevent access to the device group. # Each command device must have alternate links (PVLinks). The first command # device is the primary command device. The second command device is a # redundant command device and is used only upon failure of the primary # command device. The command devices must be mapped to the various host # interfaces by using the SVP (disk array console) or a remote console. HORCM_CMD #Primary Primary Alt-Link Secondary Secondary Alt-link #dev_name dev_name dev_name dev_name /dev/rdsk/c4t1d0 /dev/rdsk/c5t1d0 /dev/rdsk/c4t0d1 /dev/rdsk/c5t0d1 #/************************* HORCM_DEV *************************************/ # # The HORCM_DEV parameter is used to define the addresses of the physical # volumes corresponding to the paired logical volume names. Each group # name is a unique name used by the hosts which will access the volumes. # # The group and paired logical volume names defined here must be the same for # all other (remote) hosts that will access this device group. # The hardware SCSI bus, SCSI-ID, and LUNs for the device groups do not need # to be the same on remote hosts. # # <dev_group> # This parameter is used to define the device group name for paired logical # volumes. The device group name is used by all Raid Manager commands for # accessing these paired logical volumes. # <dev_name> # This parameter is used to define the names of the paired logical volumes # in the device group. # <port#> # This parameter is used to define the XP256 port number used to access the # physical volumes in the XP256 connected to the "dev_name". Consult your # XP256 for valid Port numbers to specify here. # <TargetID> # This parameter is used to define the SCSI target ID of the physical # volume on the port specified in "port#". # <LUN#> # This parameter is used to define the SCSI logical unit number (LUN) of # the physical volume specified in "targetID". HORCM_DEV #dev_group dev_name port# TargetID LUN# pkgA pkgA_index CL1-E 0 1 pkgA pkgA_tables CL1-E 0 2 pkgA pkgA_logs CL1-E 0 3 pkgB pkgB_d1 CL1-E 0 4 pkgC pkgC_d1 CL1-E 0 5 pkgD pkgD_d1 CL1-E 0 2 #/************************* HORCM_INST ************************************/ # # This parameter is used to define the network address (IP address or host # name) of the remote hosts which can provide the remote Raid Manager access # for each of the device group secondary volumes. # The remote Raid Manager instances are required to get status or provide # control of the remote devices in the device group. All remote hosts # must be defined here, so that the failure of one remote host will prevent # obtaining status. # # <dev_group> # This is the same device group names as defined in dev_group of HORC_DEV. # <ip_address> # This parameter is used to define the network address of the remote hosts # with Raid Manager access to the device group. This can be either an # IP address or a host name. # <service> # This parameter is used to specify the port name assigned to the Raid # Manager instance, which must be registered in /etc/services. If this is # a port number rather than a port name, then the port number will be used. HORCM_INST #dev_group ip_address service pkgA ftsys1a horcm0 pkgA ftsys2a horcm0 pkgB ftsys1a horcm0 pkgB ftsys2a horcm0 pkgC ftsys1a horcm0 pkgC ftsys2a horcm0 pkgD ftsys1a horcm0 pkgD ftsys2a horcm0
|
 |
Configuring Automatic Raid Manager Startup |  |
After editing the Raid Manager configuration files and installing
them on the nodes that are attached to the XP Series disk arrays,
you should configure automatic Raid Manager startup on the same
nodes. You do this by editing the rc script /etc/rc.config.d/raidmgr. Set the START_RAIDMGR parameter to 1, and define RAIDMGR_INSTANCE as the number of the Raid Manager instance you
are using with MetroCluster. By default, this is zero (0). An example of the edited startup file is shown below:  |
#*************************** RAIDMANAGER ************************* # MetroCluster with Continuous Access Toolkit script for configuring the # startup parameters for a HP SureStore E Disk Array XP256 Raid Manager # instance. The Raid Manager instance must be running before any # MetroCluster package can start up successfully. # # @(#) $Revision: 1.8 $ # # START_RAIDMGR: If set to 1, this host will attempt to start up # an instance of the Disk Array XP256 Raid Manager, # which must be running before a MetroCluster package # can be successfully started. If set to 0, this host # will not attempt to start the Raid Manager. # # RAIDMGR_INSTANCE This is the instance number of the Raid Manager # instance to be started by this script. The instance # number specified here must be the same as the # instance number specified in the MetroCluster # package control script. # Consult your Raid Manager documentation for more # information on Raid Manager instances. # # See the MetroCluster and Raid Manager documentation for more information # on configuring this script. # START_RAIDMGR=0 RAIDMGR_INSTANCE=0
|
 |
Verifying the XP Series Disk Array Configuration |  |
Use the following
checklist to verify the configuration. Creating and Exporting Volume Groups |  |
Use the following procedure to create volume groups and export
them for access by other nodes. The sample script mk1VGs in the Samples directory can be modified to automate these steps. Define the appropriate Volume Groups
on each node that might run the application package. Use the commands: # mkdir /dev/vgxx # mknod /dev/vgxx/group c 64 0xnn0000 where the name /dev/vgxx and the number nn are unique within the cluster. Create volume groups only on the primary system.
Use the vgcreate and the vgextend command, specifying the appropriate HP-UX device file
names. Use the vgexport command with the -p option to export the VGs on the primary system without
removing the HP-UX device files: # vgchange -a n vgname # vgexport -v -s -p -m mapfilename vgname Make sure that you copy the map files to all of the nodes.
The sample script Samples/ftpit shows a semi-automated way (using ftp) to copy the files. You need only enter the password
interactively.
Importing Volume Groups on Other Nodes |  |
Use the following procedure to import volume groups. The sample
script mk2imports can be modified to automate these steps. Import the VGs on all of the other
systems that will run the MC/ServiceGuard package, and back up the
configuration. Use the following commands: # vgimport -v -s -m mapfilename vgname Back up the configuration. Use the following commands: # vgchange -a y vgname # vgcfgbackup vgname # vgchange -a n vgname See the sample script Samples/mk2imports.
 |  |  |  |  | NOTE: Exclusive activation must be used
for all volume groups associated with packages that use the XP Series
disk array. The design of MetroCluster/CA assumes that only one
system in the cluster will have a VG activated at a time. |  |  |  |  |
Configuring PV Links |  |
The examples in the previous sections show the use of the
vgimport and vgexport commands with the -s option. Also, the mk1VGs script uses a -s in the vgexport command, and the mk2imports script uses a -s in the vgimport command. You may wish to remove this option from both commands
if you are using PV links. The -s option to the vgexport command saves the volume group id (VGID) in the map file,
but it does not preserve the order of PV links. To specify the exact
order of PV links, do not use the -s option with vgexport, and in the vgimport command, enter the individual links in the desired order,
as in the following example: # vgimport -v -m mapfilename vgname linkname1 linkname2
|