| United States-English |
|
|
|
![]() |
HP Global Workload Manager Version 4.1 User's Guide > Chapter 5 Additional Configuration and Administration TasksAutomatic Restart of gWLM’s Managed Nodes in SRDs (High Availability) |
|
Whenever a managed node boots, the node’s gWLM agent attempts to automatically rejoin the node in its SRD, providing high availability. The only configuration steps you need to perform for this behavior to happen are:
This feature works best when one managed node is lost at a time or all managed nodes are lost.
When a managed node boots, the gWLM agent (gwlmagent) starts automatically if GWLM_AGENT_START is set to 1 in the file /etc/rc.config.d/gwlmCtl. The agent then checks the file /etc/opt/gwlm/deployed.config to determine its CMS. Next, it attempts to contact the CMS to have the CMS re-deploy its view of the SRD. If the CMS cannot be contacted, the SRD in the deployed.config file is deployed as long as all nodes agree. In general, when an SRD is disrupted by a node’s going down, by a CMS's going down, or by network communications issues, gWLM attempts to reform the SRD. gWLM maintains the concept of a cluster for the nodes in an SRD. In a cluster, one node is a master and the other nodes are nonmasters. If the master node loses contact with the rest of the SRD, the rest of the SRD can continue without it, as a partial cluster, by unanimously agreeing on a new master. If a nonmaster loses communication with the rest of the SRD, the resulting partial cluster continues operation without the lost node. The master simply omits the missing node until it becomes available again. You can use the gwlmstatus command to monitor availability. It can tell you whether any hosts are unable to rejoin a node's SRD as well as whether hosts in the SRD are nonresponsive. For more information, see gwlmstatus(1M).
You can configure the following HP SIM events regarding this automatic restart feature:
For information on enabling and viewing these events, refer
to Optimize You can then view these events using the Event Lists item in the left pane of HP SIM. The following sections explain how to handle some of the events. If you see the event “Node Failed to Rejoin SRD on Start-up”:
If you have an SRD containing n nodes and you get n - 1 of the “SRD Communication Issue” events but no “SRD Reformed with Partial Set of Nodes” events within 5 minutes (assuming an allocation interval of 15 seconds) of the first “SRD Communication Issue” event, you may need to restart the gwlmagent on each managed node in the affected SRD: # /opt/gwlm/bin/gwlmagent --restart If gWLM is unable to reform an SRD, you can manually clear the SRD, as described in the following section. The following command is an advanced command for clearing an SRD. The recommended method for typically removing a host from management is by using the gwlm undeploy command. Starting with A.02.50.00.04 agents, you can manually clear an SRD with the following command: # gwlm reset --host=host where host specifies the host with the SRD to be cleared. If this command does not work, use the procedure given in the following section. The procedure in this section clears an SRD regardless of the version of the agents in the SRD. The gwlm command is added to the path during installation. On HP-UX systems, the command is in /opt/gwlm/bin/. On Microsoft Windows systems, the command is in C:\Program Files\HP\Virtual Server Environment\bin\gwlm\ by default. However, a different path may have been selected at installation.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||