The Beowulf software environment is implemented as an add-on to commercially available, royalty-free base Linux distributions. These distributions include all of the software needed for a networked workstation: the kernel, Linux utilities, the GNU software suite, and many add-on packages. Initially we used the very popular Slackware distribution. We have now migrated to the RedHat distribution with its better package management and upgrade system.
The Beowulf distribution includes several programming environments (all developed elsewhere) and development libraries as individually installable packages. PVM, MPI, and BSP are all available. SYS-V--style IPC and p-threads are also supported. A considerable amount of work has gone into improving the network subsystem of the kernel and implementing fast network device support. Most of these changes have been incorporated into the kernel source code tree, although many device driver updates are still in the active development or tuning phase.
In the Beowulf scheme, as is common in NOW clusters, every node is responsible for running its own copy of the kernel and nodes are generally sovereign and autonomous at the kernel level. However, in the interests of presenting a more uniform system image to both users and applications, we have extended the Linux kernel to allow a loose ensemble of nodes to participate in a number of global namespaces. A guiding principle of these extensions is to have little increase in the kernel size or complexity and, most importantly, negligible impact on the individual processor performance.
There are several distributed application programming environments available on Beowulf. The most commonly used are the PVM and MPI environments (we use the LAM implementation), with BSP also available and used.
The Linux kernel provides a VFS-like interface into the virtual memory system. This makes it simpler to add transparent distributed back-ends to implicitly managed namespaces. Page-based systems can be created that allow the entire memory of a cluster to be accessed either almost or completely transparently.
An additional environment being added to the Beowulf packages is page-based Network Virtual Memory (NVM), also known as Distributed Shared Memory (DSM). Page-based distributed shared memory uses the virtual memory hardware of the processor and a software-enforced ownership and consistency policy to give the illusion of a memory region shared among processes running an application.
The initial Beowulf DSM implementation was based on the ZOUNDS (Zero Overhead Unified Network DSM System) system from Sarnoff. Our own distributed shared memory package, implemented by Jason Crawford at GSFC, will soon be available (June 1997).
We have also evaluated a commercial DSM package, TreadMarks from Rice University.
Beowulf systems are a good idea because technology advances have changed the economics of computing. The tradeoff now favors clusters of commodity technology over higher-performance but lower volume systems. A significant part of putting together a near-optimal system is "technology-tracking": Here are a few topics we actively follow:
Beowulf cluster software is being developed in as many places as Beowulf clusters are being built. Below are pointers I am aware of:
In addition to the above software, additional Linux-related software has been developed at CESDIS.