The Beowulf Parallel Workstation architecture comprises 16 processor nodes interconnected by multiple parallel Ethernet channels and includes a keyboard and dual high resolution screens. Each processor node combines an Intel x86 processor with memory, and disk storage, and network interfaces. The initial Beowulf prototype previously described [6] used the Intel 80486 DX4 (100 MHz) processor connected by VESA-local bus to 16 MBytes of memory and single 520 MBytes IDE disk drives. Dual 10 Mbps Ethernet channels provided system connectivity. The entire system is housed in a single half-height rack as shown in Figure 1.
Figure 1: Beowulf Parallel Workstation
The Beowulf philosophy is to provide a general structure that may track the rapid evolution of commodity technology, providing capability growth while minimizing the need for changes to underlying software. This approach has been followed in the implementation of the recently completed Beowulf Demonstration system. This new system retains the general Beowulf architecture described above but incorporates new components that are incremental enhancements of those making up the prototype. The Beowulf Demonstration system processor is the new Pentium (100 MHz) connected by a PCI bus to 32 MBytes of main memory and 1.2 GBytes of disk. As will be shown, the most important difference between the prototype and demonstration systems is that the latter employs the new Fast Ethernet technology, only now available in the commodity market. This network technology has a peak performance of 100 Mbps, 10 times that of the regular Ethernet used by the prototype system. Although substantially more expensive than regular Ethernet, the improved bandwidth was required to achieve a balanced system architecture, as will be demonstrated.
A major objective of Beowulf is to provide rapid access to disk storage. The two elements of the Beowulf architecture that impact the movement of spinning-bits-to-pixels (one of Beowulf's primary uses is scientific data visualization) is the rate at which data moves between the disk and memory and the rate at which data moves between memories on separate processor subsystems. The Beowulf prototype was the target of empirical studies to characterize its principal attributes. Of primary importance was the sustainable performance achieved using multiple Ethernets in parallel. The ability to gang Ethernet channels was a key factor in enhancing interprocessor communication through low cost technology.
To determine the scaling properties of parallel Ethernets (10 Mbps), a set of experiments was conducted whose essential findings are captured in Figures 2 and 3. The results of the same experiments on the new Beowulf Demonstration system are presented in Figures 4 and 5. The experimental method and results of both sets of experiments are discussed in Sections 3 and 4, while the implications for distributed computing are presented in Section 5.
Figure 2: Beowulf Prototype Network Throughput
Figure 3: Beowulf Prototype File Transfers
(2 channels, Total of 7 local and remote files)