A Design Study of Alternative Network Topologies for the Beowulf Parallel Workstation



next up previous
Next: References Up: A Design Study Previous: Discussion

Conclusions

 

Small ensembles of commodity PC-derived processing subsystems have been shown to be performance competitive with scientific workstations at lower cost while providing dramatically improved secondary storage capacity and bandwidth. Sustained performance for distributed computation is sensitive to the interprocessor communications network employed. This paper has presented findings of empirical studies conducted with the experimental Beowulf parallel workstation to characterize the design space of interconnect topologies that may be implemented using strictly mass market network technology.

The original networking strategy for Beowulf was to use multiple Ethernet networks in parallel, each connecting all the nodes within the system. Both 10 Mbps and the new 100 Mbps Fast Ethernet were employed in separate Beowulf systems. The parallel networks were managed through a technique called channel bonding that uniformly distributed packets among the interconnects in a manner transparent to the user code.

Synthetic programs that generated a controlled network demand were used to compare the original Beowulf topologies to the two new segmented topologies. Both segmented topologies used eight separate Ethernet segments arranged as if the nodes formed a two dimensional grid with 4 segments connecting rows and 4 segments connecting the columns. In the Software Routed topology, traffic between nodes that do not share a segment uses an immediate node as a router. In the Switched topology, the four horizontal and four vertical segments were connected by two 4 port switches. Routing in each of these topologies was statically set.

As anticipated, sustained throughput was highly sensitive to packet size and traffic demand. However, the relationship was not always simple. As traffic increased, sustained throughput increased to the point where contention became an important source of performance degradation. As packet size increased, in general there was a tendency towards higher throughput until saturation for the given channel configuration was achieved.

In almost every situation evaluated, the new segmented topologies demonstrated superior performance to the dual network channel bonded technique using the same number of communicating producer/consumer process pairs. Two interesting situations encountered in the study are: software versus switched routing under light traffic and channel bonded switched versus single-net switched routing under heavy traffic. Software routing yielded performance near that of the hardware switch based systems excpet in light traffic conditions where the software latency became a dominant factor. In the case of large traffic demand on the switched topology, the additional load caused by packet replication inherent to routing for the channel bonded configurations actually reduced the effective aggregate throughput.

The importance of this study is its immediate application to real-world systems. The data in this paper can be used as a direct guide to configuring the hardware and software of interprocessor communications in a Beowulf class system. While the experiments were performed for 10 Mbps technology, it directly relates to the higher peak bandwidth Fast Ethernet, although the switch costs for this emerging technology is still high. For example, an important intermediate step will be to use Fast Ethernet without switches but rather with the lower cost repeaters and control traffic through software routing. It can be anticipated that many of the performance attributes will be retained.

Current work is being conducted to explore the implications of network topology in the Beowulf context driven by end-user applications. A range of problems taken from the Earth and space sciences community are providing well understood benchmarks with many different characteristics. It is anticipated that while some problems may exhibit sensitivities to the network choices, many others will be less dramatically impacted than the rather severe tests shown in this paper.



next up previous
Next: References Up: A Design Study Previous: Discussion



Chance Reschke
Mon Nov 4 13:04:09 EST 1996