| United States-English |
|
|
|
![]() |
Managing HP-UX Software With SD-UX: HP 9000 Computers > Appendix B Troubleshooting SDProblems |
|
This section presents a selection of problems you might encounter and how to resolve them. If you see the following error message:
it means that the hostname you specified could not be found in the hosts database. Make sure you have typed the hostname correctly (you can use the nslookup(1) command to verify hostnames). If the target hostname is not in the hosts database, but you know its network address, you can use it (in standard "dot" notation) in place of the hostname. If you see this error message:
it means SD could not contact the daemon program on a specific target system. Note that this may occur even if you haven't specified any targets, for example, if the daemon on your local host is not running and the select_local option is set to "true." If the SD daemon/agent is not installed on a given target system, you must install it before you can begin managing the system with SD. If you've verified that the daemon/agent component has been installed on a target system and you still have trouble contacting it, check to see that the daemon is running:
Other possible causes for this problem are listed in the section “Connection Timeouts and Other WAN Problems ”. An easy way to determine if a target system has the SD daemon installed and running is to type: /usr/sbin/swlist -l depot @ <one or more target hostnames> which will attempt to contact each target to get a list of registered depots. Those targets which have the SD daemon installed will report either:
or
For more information on daemon activity, see the daemon logfile in /var/adm/sw/swagentd.log. There are a number of things that can cause denial of access to SD objects.
Generally, when you are denied access to an SD object, the system tells you that you do not have the required access permission for the object. Sometimes it may be unclear which object is not accessible. For example, when using swcopy to copy a product from system A to a depot, a number of ACLs may be checked:
If any of these access permissions is absent, the whole operation is disallowed and the error message becomes critical in understanding the cause. To see more about what type of security or access problems exist, see the daemon log file on the target system: /var/adm/sw/swagentd.log The default SD ACLs make it fairly easy to administer ACLs, but do not always give the desired level of access control. When restricting access, especially by removing the any_other "read" permission, it is fairly easy to restrict access in unexpected ways. Remember that "host" entries are required for any destination systems for swcopy and swinstall operations. Review Chapter 7 “Modifying IPD or Catalog Contents ” for a full discussion of the access tests performed by SD for each operation. Since ACLs in SD are stored in the file system as plain text files, it may be tempting to edit them with a conventional editor. This can lead to unexpected corruption of the ACL. Most cases of this corruption simply result in a message indicating the corruption, but inserting additions to the ACL file without updating the num_entries value can result in unreported problems leading only to denial of access. A common failure could occur, for instance, if a user entry were inserted, pushing the "any_other" entry down beyond the num_entries limit, resulting in the SD ACL manager never reading the any_other entry and causing problems. The best guard against this is to always use the swacl command to manipulate ACLs. The default /var/adm/sw/security/secrets file shipped with SD contains a single entry:
The "-sdu-" should be replaced by a different default secret, or the entire entry eliminated if you wish to explicitly name all hosts from which controllers can be run. See Chapter 7 “Modifying IPD or Catalog Contents ” for a thorough discussion of the secrets file. The controller (for swinstall, swcopy, etc.) looks up the secret for the system on which it runs and passes it in an encrypted form to its agent. The agent receiving a request from the controller looks up the secret for the host from which the call comes, encrypts it and compares the encryption to that provided by the controller. If the two secrets do not match, access is denied. If you have problems with this mechanism, make sure that all systems have matching entries. You can also revert to the original secrets file (/etc/newconfig/sd/secrets on 9.x and /usr/newconfig/var/adm/sw/security/secrets on 10.x) on all hosts, or simply copy a single secrets file to all hosts. There is a problem in using cp, tar, cpio, dd, and other commands to copy images of depots for use on other systems. The problem is that depot and product ACLs in the image have built-in knowledge of the host on which the depot originated. In particular, the ACL's default realm will be wrong and local users will be confused with users on the originating host. For example, attempts to add local users to the access list will, in fact, grant access to remote users. Other problems can also arise. Since there is no way to alter the default realm of an ACL from that set when it is created, this is an intractable problem. Another common problem with such images occurs when they are imported to systems which cannot resolve all the hostnames (see resolver(4) and nslookup(1)) which exist in the ACLs. If your purpose is to create a "staged" installation, use swcopy to propagate the depot. This will create new ACLs, based on local templates, for each instance of the depot. If the sole intent of a depot is for such image distribution, you may wish to set the swpackage.create_target_acls option to false to prevent ACL creation on the depot and products during the swpackage operation. This is the option used to create tape and CD-ROM images. ACL-less depots and products grant the local superuser all privileges, while all other users and systems have read access. Note that when you copy or install this ACL-less depot with swcopy or swinstall, the copies (installations) are automatically protected by ACLs based on templates on the destination host. When using swinstall or swcopy in an environment where network bandwidth is the "bottleneck," the file transfer rate between source and target(s) can become very slow. The compress_files=true option compresses files transferred from a source depot to a target. This can reduce network usage by approximately 50%; the exact amount of compression depends on the type of files. Binary files compress less than 50%, text files generally compress more. The greatest throughput improvements are seen when transfers are across a slow network (approximately 50kbyte/sec or less), and the source depot server is serving a few target hosts at a time.
If it is not clear that this option will help in your situation, compare the throughput of a few install or copy tasks (both with and without compression) before changing this option value. Low-throughput, wide-area networks can cause SD to encounter time-out problems when establishing and maintaining network connections with remote agents on other systems. If you see the following messages:
or
and you have verified that the system is up and the daemon program (swagentd) is running on it, it may be that network delays are causing the connection to time-out. Increase the time-out value used by SD when performing Remote Procedure Calls (RPCs) by specifying a higher value for the rpc_timeout option, either via the command line or in the defaults file. RPC time-out values range from 0 to 9, with 9 being the longest time-out. The SD default RPC time-out value is 5. Note that these values do not represent any specific time units. See Appendix A “Default Options and Keywords ” for more information on the rpc_timeout option. Increasing the rpc_timeout can also help in situations where the target agents in an install or copy session are timing out when trying to contact the source agent. This problem is indicated by the following error messages in the agent log file:
Another factor that can affect RPC timeouts on a slow network is the choice of network protocol. SD supports both UDP- and TCP-based communication (the default is UDP). TCP communication can be more reliable on a WAN because it is connection-based. If you are still having time-out problems on a slow-throughput WAN, you can tell SD to use the TCP-based communication instead, via the rpc_binding_info option: rpc_binding_info=ncacn_ip_tcp:[2121] As with any controller option, you can specify this option on the command line using the -x option or by editing the defaults file. Note that the daemon program (swagentd) listens for both UDP- and TCP-based RPCs by default. See Appendix A “Default Options and Keywords ” for more information on the rpc_binding_info option. A final WAN-related issue may arise when using the interactive GUI. During the analysis and execution phases of an interactive SD session, each target agent is periodically "polled" for up-to-date status information. The polling_interval option can be used to control the number of seconds that elapse between successive status polls of a given target system. On networks where even this minor data transfer is a problem, you can increase this polling interval, thus decreasing the frequency of polling, and reducing an interactive session's overall demands on the network. See Appendix A “Default Options and Keywords ” for more information on the polling_interval option. Your installation or copy operation runs out of space even though the disk space analysis succeeded. Upon further checking, you find that the results of the disk space analysis differ from the actual space available. Possible causes of this problem:
A swpackage operation may fail because of the incorrect use of the end keyword in the Product Specification File (PSF). The end keyword marks the end of a depot, vendor, product, subproduct or fileset specification in a PSF. It requires no value and is optional. However, if you use it and it is incorrectly placed, the specification will fail. Check to make sure, if you use it, there is an end keyword for every object specification (especially the last one). If you want to shorten (truncate) the SD daemon logfile because it is getting too long, follow this procedure: If the daemon is currently running, DO NOT remove its logfile. The running daemon will continue to log messages to its logfile even after you've removed it, causing any subsequent information to be lost. Also, the disk space used by the logfile will not be freed as long as the daemon is running. Instead, truncate the logfile by typing (as root): echo > /var/adm/sw/swagentd.log which replaces the data that was there with an empty string. If you inadvertently remove the daemon logfile while it is running, you must kill and restart the daemon if you want to see subsequent daemon log messages and free up the disk space used by the logfile. You can stop (kill) a daemon by typing: usr/sbin/swagentd -k You can also kill and restart a currently running daemon by typing: usr/sbin/swagentd -r If you are trying to access a tape depot and see the following error message in the daemon logfile, it means that the tape is either corrupt or is not in SD format.
If an installation fails part way through the install, SD lets you easily restart the operation (either by just re-executing the same command from the command line, or by recalling the session file swinstall.last that was automatically saved for you). By default, SD checkpoints to the fileset level, meaning that the operation will start transferring files with the last fileset to be attempted. By setting the reinstall_files option to false, the distribution and installation of files will begin with the file that was last attempted. SD does not support checkpointing below the file level. Also, all checkpointing can be overridden by setting both the reinstall and reinstall_files options to true. See Appendix A “Default Options and Keywords ” for more information on these options. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||