A Design Study of Alternative Network Topologies for the Beowulf Parallel Workstation



next up previous
Next: Beowulf Architecture Up: A Design Study Previous: Abstract

Introduction

The potential for integrating mass-market commodity subsystems in desk-side ensembles to achieve high performance computing at low cost has been demonstrated by the Beowulf Parallel Workstation for single-user environments. Beowulf is a class of experimental scientific workstations incorporating up to 16 PC-technology derived processing nodes, integrated by means of multiple commodity networks. Once limited to undemanding consumer tasks, PC-based system technology is now performance competitive with workstation class computers while excelling in price-performance. Thus a Pile-of-PCs or PopC (``pop-see'') approach is emerging to complement the Cluster of Workstations (COW) or Network of Workstations (NOW) [3] distributed computing approaches using more costly subsystems, such as the Princeton SHRIMP project [1], which incorporates custom communication hardware. Beowulf has been used to explore this new point in the design space for scientific workstations by exploiting a degree of parallelism rarely encountered within the constraints of a dedicated end-user terminal. It has been shown that a 16 node distributed system exclusively comprising PC-market derived subsystems can provide peak performance in excess of 1 GOPS and ten times the disk capacity and bandwidth routinely provided by high-end scientific workstations at comparable cost. This capability has been realized through a balanced system structure and enhancements to the Linux operating system, providing a user interactive interface in many respects similar to that typically found on conventional environments. Key factors determining the viability of this approach are the achievable interprocessor network bandwidth and the system support software for implementing effective parallel disk I/O. This paper addresses the first of these two issues. It presents new findings that quantify the interconnect design tradeoffs space and demonstrate the potential importance of alternate network topologies, even in the context of modest system configurations such as those implicit in the Beowulf approach.

Even in the context of a single-user workstation, employing multiple processors benefits both execution performance and disk access bandwidth. An additional consequence is the large disk capacity achieved at low cost through multiple commodity disk drives. This enables, for example, very large scientific data-sets to be temporarily buffered locally by the parallel workstation throughout a session of data browsing and visualization, avoiding repeated accesses to remote file servers. User response time is significantly reduced as is the burden on shared LAN and remote file server resources. But these advantages come at the expense of requiring an internal interconnection network of sufficient capacity to meet the demand of interprocessor data transfers, a problem not encountered on uniprocessor workstations. The experimental Beowulf workstation has been used to explore the feasibility of employing multiple cost-effective commodity networks in parallel to satisfy these internal data transfer rate requirements. User-transparent access to multiple parallel Ethernet networks across the processor nodes was achieved by ``channel bonding'' techniques developed through enhancements to the Linux operating system. In previous work it was shown that up to 3 networks could be ganged to achieve significant scaling of sustained throughput [9], validating the channel bonding method. Additional studies were performed with the new 100 Mbps Fast Ethernet demonstrating effective utilization of dual channels [11]. However, Fast Ethernet is only beginning to become cost effective with sustainable performance to cost of the new technology now approaching that of the older 10baseT interconnects. In either case, each network connected all nodes within Beowulf.

These commodity networks are essentially multi-drop media and are well suited to system configurations in which each network channel connects all nodes. However, it was recognized that the potential for higher performance exists through more complex topologies, at possibly somewhat greater cost. The weakness of any multidrop network is that only one communication packet may be transferred at a time. Channel bonding permitted the number of simultaneous packets transferred to, at least in principal, be equal to the number of parallel channels employed. But these are limited by both cost and the number of network controllers that any processor node can practically support. An alternative approach for increasing the peak aggregate bandwidth with comparable resources is network segmentation. A given channel is divided into a number of non-overlapping sub-channels. Each subchannel can support independent packet transfers unless nodes on separate sub-channels must interact. A peak throughput gain of a factor of 4 is theoretically possible for Beowulf using segmentation at the cost of fully interconnected non-blocking switches. Through more complex topologies, still using the basic Ethernet technology, much of the possible sustained throughput gain might be realized without resorting to the extreme requirement of a fully connected switch for each channel.

An initial study [7] exposed the potential of this opportunity but did not provide a clear characterization of the tradeoffs involved. It was shown that under a range of workload demands measurable gains, sometimes significant, can be achieved using 4-way segmentation of Ethernet channels arranged in specific topologies. Motivated by these preliminary results, an empirical study was undertaken to characterize and quantify the design tradeoff space for these more complex topologies and compare the performance achieved to the original channel bonded approach. Both hardware switching among segments as well as software routing mechanisms were evaluated under synthetic packet generation workloads and parallel file copying applications.

This paper presents the findings of these studies, enabling meaningful design methodology of Beowulf class systems. The following section specifies the basic architecture attributes of Beowulf parallel workstations. Section 3 discusses the Linux based software environment provided on Beowulf including the enhancements made to support interprocessor communication. The experimental methods employed to characterize the system internal communications and the results of the experiments are provided in Section 4. These findings are analyzed and their implications discussed in Section 5. A summary of the overall findings and their importance for future directions is concluded in Section 6.



next up previous
Next: Beowulf Architecture Up: A Design Study Previous: Abstract



Chance Reschke
Mon Nov 4 13:04:09 EST 1996