LTSP - Troubleshooting guide

Jim McQuillan (jam@ltsp.org)

v1.03 03 January 2001

1. Copyright

Copyright 2001 James A. McQuillan. Permission to distribute and modify this document is granted under the GNU General Public License - Version 2, June 1991. For the full text of the License, you can view it at: www.ltsp.org/license.txt.

2. Troubleshooting

2.1 Places to look for error messages:

If there is a problem booting the workstation, there are a couple of places you can look for errors:

  1. /var/log/messages - Lots of useful messages appear here.
  2. /var/log/secure - If tcpwrappers are enabled, you may see some messages here.
  3. /var/log/xdm-error.log - If you get past the kernel booting, and X has a problem showing the login screen, messages may be logged here.
  4. The workstation console (I know this seems obvious, but...)

2.2 The workstation doesn't boot

Failed to detect IRQ line

The full error is:

NE*0000 ethercard probe at 0x300 failed to detect IRQ line
which will cause another error after the kernel has been loaded:
IP-config NO network devices available
Root-NFS NO NFS server available  giving up
VFS: Unable to mount root fs via NFS, trying floppy
VFS: cannot open root device 02:00
PANIC VFS unable to mount root fs on 02:00
This usually indicates an IRQ conflict between the network card and another device in the system. Try removing the other cards, leaving only the netwrk and video card.

It displays 'Searching for server':

   NE2000 base 0x0300, addr XX:XX:XX:XX:XX:XX
   Searching for server (BOOTP)...
   <sleep>
   <sleep> 
This indicates that the workstation is unable to find a bootp server. Check the following:
  1. Cabling - Check the link lights on both the card and the hub. They both MUST be lit. Also, your server must be connected to the hub, and they also MUST have the lights lit on both the server's network card and the hub.
  2. Bootpd daemon (if using bootp) - Make sure the bootpd process is configured in the /etc/inetd.conf file on the server.
  3. /etc/bootptab file - When the bootrom probes for the network card, it will display the MAC address of the card. Make sure that address exists in the /etc/bootptab file on the server.
  4. bootpd may be inhibited from running, due to tcpwrappers. If you have enabled tcpwrappers, by setting up the /etc/hosts.allow and /etc/hosts.deny files, then you need to make sure you have the following entry for bootpd:
    bootpd:    0.0.0.0
    
    The 0.0.0.0 entry is the address in the bootp request. This address is used during the reqeust, because the workstation doesn't have it's address yet.
  5. dhcpd (if using DHCP) - Make sure the dhcpd process is running on the server. This can be enabled with ntsysv.
  6. /etc/dhcpd.conf - Mke sure the MAC address of the card appears in the /etc/dhcpd.conf file.

It bombs with a "Loading /tftpboot/lts/vmlinuz.ne2000... Unable to load file"

Check the following:

  1. This could be a problem with entries in the /etc/bootptab or /etc/dhpcd.conf file. Make sure the following entries are set:
    1. hd=/tftpboot/lts
    2. bf=kernel where kernel is the name of the kernel that matches the type of network card in the workstation. Such as vmlinuz.ne2000 or vmlinuz.rtl8139.
  2. Make sure the kernel actually exists in the /tftpboot/lts directory.

Tftp appears to be trying to download the kernel, but it just doesn't complete

  1. The screen shows:
    My IP 192.168.0.4, Server IP 192.168.0.254, GW IP 192.168.0.254
    Loading /tftpboot/lts/vmlinuz.ne2000... <sleep>
    <sleep>
    <sleep>
    
    tftpd may be inhibited from running, due to tcpwrappers. If you have enabled tcpwrappers, by setting up the /etc/hosts.allow and /etc/hosts.deny files, then you need to make sure you have the following entry for tftpd:
    in.tftpd:  192.168.0.
    
    This will allow tftp access to all workstations in the 192.168.0.0 class-C.
  2. The screen shows My IP in one class-C network but it shows Server IP in a different Class-C network.

    This can happen if you have multiple Ethernet interfaces installed on the server, or if have aliased IP address.

    You can overcome this problem by adding a next-server 192.168.0.254; parameter to the /etc/dhcpd.conf file.

Problem with NFS mounting root filesystem, errno=13

Errno 13 followed by Panic. This indicates a permission problem for NFS. It may be caused by the following:

  1. The workstation name and IP address MUST be specified in /etc/hosts or in the DNS tables.
  2. The /etc/exports file MUST contain an entry for the /tftpboot/lts/ltsroot directory.
    /tftpboot/lts/ltsroot        192.168.0.0/255.255.255.0(ro,no_root_squash)
    
    The IP address MUST match the local network.
  3. After modifying the exports file, you must either run exportfs -ra to tell the kernel that the exports file has changed. Or, you may want to try restarting NFS and portmapper, using the following commands:
    /etc/rc.d/init.d/nfs     stop
    /etc/rc.d/init.d/portmap stop
    /etc/rc.d/init.d/portmap start
    /etc/rc.d/init.d/nfs     start
    
  4. Make sure NFS is running on the server. Run ntsysv on the server and make sure that 'nfs' is checked. If it isn't, then enable it and reboot the server to bring it up.

RPC call returned error 111

During boot, the workstation displays:

Looking up port of RPC 100003/2 on 192.168.0.254
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.0.254
Root-NFS: Unable to get mountd port number from server, using default
mount: RPC call returned error 111
RPC: task of released request still queued!
RPC: (task is on xprt_pending)
Root-NFS: Server returned error -111 while mounting /tftpboot/lts/ltsroot
VFS: Unable to mount root fs via NFS, trying floppy.
VFS: Cannot open root device 02:00
Kernel panic: VFS: Unable to mount root fs on 02:00
This is most likely caused because of tcpwrappers being enabled. If you are using tcpwrappers, You will need to add an entry to the /etc/hosts.allow file, such as this:
portmap:      192.168.0.
This will allow portmapper access to all workstations in the 192.168.0.0 class-C.

INIT: cannot execute "/etc/rc.local"

During boot, the workstation displays:

INIT: cannot execute "/etc/rc.local"
INIT: Entering runlevel: 5
/tmp/start_ws: /tmp/start_ws: No such file or directory
/tmp/start_ws: /tmp/start_ws: No such file or directory
/tmp/start_ws: /tmp/start_ws: No such file or directory
/tmp/start_ws: /tmp/start_ws: No such file or directory
/tmp/start_ws: /tmp/start_ws: No such file or directory
This is probably caused by incorrect permissions on the /tftpboot/lts/ltsroot/etc/rc.local script. There is a bug in lts_core-1.02 that sets up the rc.local script with the incorrect permissions. It should be "-rwxr-xr-x". You can either change the permissions by doing:
chmod 0755 /tftpboot/lts/ltsroot/etc/rc.local
Or upgrade to lts_core-1.03.

Workstation stops at 'Freeing unused kernel memory 44k'

This is caused by problems with the Glibc that ships with Redhat 7.0. You need to upgrade to at least Glibc 2.2-5. It's available on Redhat's ftp site. Make sure you install the i386 version. Once you have upgraded glibc, you will need to re-install LTSP so that it picks up the correct libs.

2.3 Problems starting X-Windows

XDMCP fatal error: Manager unwilling Host unwilling

The full error is:

Fatal server error:
XDMCP fatal error: Manager unwilling Host unwilling

when reporting a problem related to a server crash, please send
the full server output, not just the last messages

INIT: Id "2" respawning too fast: disabled for 5 minutes
INIT: no more processes left in this runlevel
This is usually caused by an entry missing from the /etc/X11/xdm/Xaccess file. This file controls which machines can connect to the server via XDM. The trick is to add a line that starts with an Asterisk '*'. On Redhat 6.0, this line already existed, but on Redhat 6.1, the line was commented out. Look for a line that looks like:
# *                                     #any host can get a login window
and remove the hash '#' sign at the beginning of the line. Then, you need to restart xdm by sending the SIGHUP signal to it.
killall -HUP xdm

X tries to start, but displays funny, or not at all

There are lots of things that can go wrong with X, here are some of them:

  1. Is your font server properly running on port 7100? Double check the setup of the /etc/rc.d/init.d/xfs file.
  2. If you are running more than 4 clients, you'll need to increase the max clients setting in the /etc/X11/fs/config file.
  3. It can be helpful to run a shell while in runlevel 5. Change the /tftpboot/lts/ltsroot/etc/inittab file so that the shell line looks like:
    1:35:respawn:/bin/sh
    
    Then, after starting the workstation, you should be able to press CTRL-ALT-F1 to get to the screen where the X startup messages should appear. CTRL-ALT-F2 should get you back to the X-Windows screen.
  4. Your "Modeline" entries may be wrong for your Monitor/Adapter combination. If you want to change them, do it in /tftpboot/lts/ltsroot/etc/rc.local file. This is the script that runs to generate the XF86Config file as the workstation boots. If you can connect the monitor to the server and run Xconfigurator, you might be able to copy the relevant portions of the /etc/X11/XF86Config file it generates into the rc.local script.

X starts successfully, but the login window never appears

This indicates a possible problem with the workstation communicating with xdm on the server. Check the following:

  1. If running Redhat 6.2, Make sure you commented out the 'DisplayManager.requestPort entry.
  2. Check the XDM_SERVER entry in the /tftpboot/lts/ltsroot/etc/lts.conf file. It must be set to the address of the server.

2.4 Problems on the server

X starts on the server, but fails, even though it used to work properly

After changing the /etc/rc.d/init.d/xfs, you also need to change the server's /etc/X11/XF86Config file.

fh_verify errors on server while invoking X on the workstation

The following messages appear on the server:

fh_verify: dev/tty2 permission failure, acc=8, error=30
fh_verify: dev/tty0 permission failure, acc=8, error=30
This happens because the X server is trying to change the ownership and group of the /dev/tty2 and /dev/tty0 device nodes. The root filesystem is mounted readonly, so it can't change the ownership. The X server doesn't really need to change it, because the ownership is already correct. There are updated X servers on the ltsp.org download page that will correct this problem.

No graphical login on the server

If the server is set for runlevel 5, it is supposed to bring up the graphical login prompt when the server is booted.

The installation procedure of LTS Version 1.0 mistakenly commented out the line in the /etc/X11/xdm/Xservers file. The line should looks like:

:0 local /usr/X11R6/bin/X
Make sure the entry is NOT commented out.

This has been fixed in LTS version 1.01.

2.5 If all else fails:

If you still can't get the workstation going, please use the discuss mailing list at ltsp.org to report your problems. There are several people who can help and I try to keep a close eye on that list.