 |
» |
|
|
 |
Follow this procedure to configure your upgraded system and propagate the new golden image to all client nodes: Run the following utility to back up the existing database and migrate existing data to the new release format: Command output looks similar to the following: The upgradesys utility performs all the necessary steps
to upgrade your cluster. This script should be run immediately
after you have upgraded the head node with the latest XC software
and any third party vendor rpms.
Do you wish to continue? [y/n] y
Backing up database to
/opt/hptc/etc/sysconfig/upgrade/upgradesys.dbbackup-20060103145027.sql ...
Executing C02database gupdate
Starting MySQL: [ OK ]
Executing C20server_type gupdate
Executing C30device_names gupdate
Executing C33etc_hosts gupdate
Executing C35region gupdate
Executing C40role_migration gupdate
Executing C90systemimager gupdate
Removing XC MLIB RPMs
upgradesys output logged to /var/log/upgradesys/upgradesys.log |
Run the cluster_config utility to configure your system. Table 7-7 describes two options to the cluster_config utility that you can use to reconfigure a system after a software upgrade. Decide which option you want depending upon how you want the upgrade to proceed. Table 7-7 Upgrade Options for the cluster_config Utility | --migrate Option | --init Option |
|---|
Performs a series of known migration steps to bring existing, recognized roles in the database into alignment with the new roles introduced in this release. Using this option does not guarantee the correct migration steps for unrecognized (user-created) roles and services in the database. Before you decide to use this option, view the /opt/hptc/etc/sysconfig/upgrade/role_migration.ini file to see how your previous role assignments compare to the roles provided in XC Version 3.0. | Initializes (resets) your existing node role assignments and configures your system with the default node role assignments introduced in XC Version 3.0. The default roles and assignments have been optimized for performance in XC Version 3.0, and you may decide that this configuration is better suited for your environment. See Chapter 8 for a description of roles and the services provided by them, as well as the default node role assignments. |
Enter one of the following cluster_config options: To migrate your existing system configuration: # /opt/hptc/config/sbin/cluster_config --migrate |
Proceed to step 3. To apply new default role assignments to your existing system configuration: # /opt/hptc/config/sbin/cluster_config --init |
If you specify the --init option, in the next step you must remember to reassign any role assignments you previously customized. For example, if your system configuration had login roles on one or more nodes, you must assign a login role on any node on which you want users to be able to log in. In the default configuration, a login role is not assigned to any node.
Proceed to step 3. The cluster_config utility displays the following menu. Enter the letter p to proceed with the system configuration process; refer to Appendix F for information about using this menu. [L]ist Nodes, [M]odify Nodes, [A]nalyze, [H]elp, [P]roceed, [Q]uit: p
Do you want to apply your changes to the cluster configuration? [y/n] y
[S]ervices Config, [P]roceed, [Q]uit: p
Do you want to apply your changes to the service configuration? [y/n] y |
Follow along on your system while the cluster_config utility is configuring your system and, when prompted, provide the answers listed in Table 7-8. Table 7-8 Answers to cluster_config Prompts | Prompt | Answer |
|---|
Regenerate ssh keys? | yes | Recreate the qsnet database? (For systems using a QSnetII interconnect). | yes | Reconfigure SLURM? | yes | Create new slurm.conf file? | yes | Install LSF? | yes | Upgrade to new version of LSF? | u (upgrade) | All other prompts | Accept the default response |
Output from the cluster_config command looks similar to the following:  |
Configuring system wide functions / policies / behaviors
Executing C02ssh_config
sconfigure Root ssh keys for the cluster already exist
(Warning: you will not be able to ssh/pdsh to other nodes until
you reimage them)
Would you like to regenerate them? ([n]/y) y
Executing C10cluster_fstab sconfigure
Executing C20sysparams sconfigure
NFS daemon tuning:
Given that there are 6 nodes in this cluster, enter the number of
NFS daemons that shall be configured to support them [8] : Enter
Executing C75mpiic sconfigure
Configuring service specific functions
Executing C05pdsh gconfigure
Executing C08ntp gconfigure
Configuring the following nodes as ntp servers for the cluster:
n16
You must now specify the clock source for the server nodes.
If the nodes have external connections, you may specify up
to 4 external NTP servers. Otherwise, you must use the node's
system clock.
Enter the IP address or host name of the first external NTP server
or leave blank to use the system clock on the NTP server node:
Renaming previous /etc/ntp.conf to /etc/ntp.conf.bak
Executing C10hptc_cluster_fs gconfigure
Executing C20gmmon gconfigure
Executing C30swmlogger gconfigure
Executing C30syslogng_forward gconfigure
Executing C35dhcp gconfigure
Executing C50cmf gconfigure
Executing C50nagios gconfigure
Would you like to enable web based monitoring? ([y]/n) y
Enter the password for the 'nagiosadmin' web user:
New password:
Re-type new password:
Adding password for user nagiosadmin
Executing C50nat gconfigure
Executing C50supermond gconfigure
Executing C51nagios_monitor gconfigure
Executing C50nat gconfigure
Executing C50supermond gconfigure
Executing C51nagios_monitor gconfigure
Executing C60nis gconfigure
Network Information Service (NIS) Configuration
This step sets up one or more NIS servers within the XC system
that are "slaves" to an external NIS "master". The master NIS
server provides the slaves with copies of its NIS maps.
In order to successfully complete this configuration step, the NIS
master must have been previously set to allow slaves to communicate
with it. On Linux systems, this is typically accomplished by adding
the NIS slave hostname(s) to the /var/yp/ypservers file on the NIS
master, and then running 'make'.
In addition, to complete this configuration, you will need to provide
1) the name or IP address of the NIS master, and
2) the NIS domain name hosted by the NIS master
Enter the name or IP address of the external NIS master: [] NIS_IP_address
Enter the NIS domain hosted by the NIS master: [] your_NIS_domain
Executing C90munge gconfigure
Executing C90slurm gconfigure
Do you want to configure SLURM now? (y/n) [y]:y
An existing SLURM configuration file has been detected.
Do you want to delete this file and generate a new one?
Answering 'no' means to edit the existing file. (y/n) [n]: y
This SLURM configuration needs a special SLURM user. The SLURM
controller daemons will be run by this user, and certain SLURM
runtime files will be owned by this user.
Enter the SLURM username [slurm]: Enter
n16 is the only node with the Resource Management
role. Therefore the SLURM Master Controller daemon will be set up
on this node, and there will be no SLURM Backup Controller.
The current Compute Node configuration is:
NodeName=n[11-16] Procs=2
NOTE: The only Partition created by default is the lsf
partition. If you want additional partitions, configure
them manually in the /hptc_cluster/slurm/etc/slurm.conf file.
The current Node Partition configuration is:
PartitionName=lsf RootOnly=YES Shared=FORCE Nodes= n[11-16]
Do you want to enable SLURM-controlled user-access to the
compute nodes? (y/n) [n]: n
SLURM configuration complete. Press 'Enter' to continue: Enter
Executing C95lsf gconfigure
Do you want to install LSF locally now? (y|n) [y]: y
LSF appears to be already installed. Do you want to upgrade this
installation, or delete it and perform a clean install?
([u]pgrade or [d]elete) [u]: u
Pre-installation check report saved as text file:
/opt/hptc/lsf/files/lsfhpc/install-20051216023643/hpc6.1_hpcinstall/ \
prechk.rpt.
... Done LSF pre-installation check.
... Done installing hpc binary files "linux2.6-glibc2.3-ia32e-slurm".
... LSF configuration is done.
hpcinstall is done.
To complete your hpc installation and get your
cluster "hptclsf" up and running, follow the steps in
"/opt/hptc/lsf/files/lsfhpc/install-20051216023643/hpc6.1_hpcinstall/ \
hpc_getting_started.html".
After setting up your LSF server hosts and verifying
your cluster "hptclsf" is running correctly,
see "/opt/hptc/lsf/top/6.1/hpc_quick_admin.html"
to learn more about your new LSF cluster.
***Begin LSF-HPC Post-Processing***
Created '/hptc_cluster/lsf/tmp'...
Editing /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf...
Moving /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf
to /opt/hptc/lsf/top/conf/lsf.cluster.hptclsf.old.6490...
Editing /opt/hptc/lsf/top/conf/lsf.conf...
Moving /opt/hptc/lsf/top/conf/lsf.conf
to /opt/hptc/lsf/top/conf/lsf.conf.old.6490...
Editing /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params...
Moving /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params
to /opt/hptc/lsf/top/conf/lsbatch/hptclsf/configdir/lsb.params.old.6490...
Replaced default lsb.queues with a preconfigured lsb.queues.
C95lsf finished
Configuring the image replication environment
Initializing 172.20.0.16 as golden client
Creating the golden image (takes approximately 10 minutes)
**Do not interrupt this process or else the golden image will be incomplete**
Setting up the bootserver
Linking client nodes to their autoinstall script
Initializing service persistence
Sanitizing services in the golden image
Creating golden image 'tar' file (takes approximately 10-15 minutes)
Verifying integrity of golden image 'tar' file
Image replication environment configuration complete.
info: nconfig started
info: Executing on head node
info: Executing C02network nconfigure
info: Executing C04iptables nconfigure
info: Executing C06nfs_server nconfigure
info: Executing C08ntp nconfigure
info: Executing C10hptc_cluster_fs nconfigure
info: Executing C10hptc_cluster_fs_client nconfigure
info: Executing C20gmmon nconfigure
info: Executing C30swmlogger nconfigure
info: Executing C30syslogng_forward nconfigure
info: Executing C40hpasm nconfigure
info: Executing C50cmf nconfigure
info: Executing C50collectl nconfigure
info: Executing C50gather_data nconfigure
info: Executing C50hptc-lm nconfigure
info: Executing C50nagios nconfigure
info: Executing C50nat nconfigure
info: Executing C50supermond nconfigure
info: Executing C51nagios_monitor nconfigure
info: Executing C51nrpe nconfigure
info: Executing C90munge nconfigure
info: Executing C90slurm nconfigure
info: Executing C95lsf nconfigure
info: Executing C30syslogng_forward cconfigure
info: Executing C35dhcp cconfigure
info: Executing C50supermond cconfigure
info: Executing C90munge cconfigure
info: Executing C90slurm cconfigure
info: Executing C95lsf cconfigure
info: nconfig shut down
info: nconfig started
info: Executing on head node
info: Executing C02network nrestart
info: Executing C04iptables nrestart
info: Executing C06nfs_server nrestart
info: Executing C08ntp nrestart
info: Executing C10hptc_cluster_fs nrestart
info: Executing C10hptc_cluster_fs_client nrestart
info: Executing C20gmmon nrestart
info: Executing C30swmlogger nrestart
info: Executing C30syslogng_forward nrestart
info: Executing C40hpasm nrestart
info: Executing C50cmf nrestart
info: Executing C50collectl nrestart
info: Executing C50gather_data nrestart
info: Executing C50hptc-lm nrestart
info: Executing C50nagios nrestart
info: Executing C50nat nrestart
info: Executing C50supermond nrestart
info: Executing C51nagios_monitor nrestart
info: Executing C51nrpe nrestart
info: Executing C90munge nrestart
info: Executing C90slurm nrestart
info: Executing C95lsf nrestart
info: Executing C30syslogng_forward crestart
info: Executing C35dhcp crestart
info: Executing C50supermond crestart
info: Executing C90munge crestart
info: Executing C90slurm crestart
info: Executing C95lsf crestart
info: nconfig shut down |
 |
Look at the backup copy of the slurm.conf file, which is located in the /hptc_cluster/slurm/etc/slurm.conf.bak file. If you had previously customized this file, you must merge those customizations into the new version of the /hptc_cluster/slurm/etc/slurm.conf file. Otherwise, skip this step. Re-enter the monitoring line card entries in the /etc/dhcpd.conf file if your system is using a QSnetII or Myrinet interconnect. See Appendix D for more information about adding these entries to the file. Skip this step if your system is using an InfiniBand or Gigabit Ethernet interconnect. Enter one of the following commands depending upon the size of your system: On systems with fewer than 300 nodes, enter this command to image and boot all client nodes: # startsys --image_and_boot |
On systems with more than 300 nodes, enter this command to image the client nodes. Then, proceed to step 8 to boot the nodes after they are imaged.
Enter the following command to boot the client nodes on systems with more than 300 nodes because the nodes were not booted during their imaging operation: # startsys --boot_group_delay=240 |
Make sure all nodes are up: If your system is configured with LSF-HPC with SLURM, run the SLURM postconfiguration utility to update the slurm.conf file with compute node names and attributes: Set up the LSF environment by sourcing the LSF profile file: # . /opt/hptc/lsf/top/conf/profile.lsf |
|