PCODE: an efficient and reliable collective communication protocol for unreliable broadcast domains

Jehoshua Bruck*, Danny Dolev, Ching Tien Ho, Rimon Orni, Ray Strong

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Existing programming environments for clusters are typically built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. For example, a broadcast that is implemented using a TCP/IP protocol (which is a point-to-point protocol) over a LAN is obviously inefficient as it is not utilizing the fact that the LAN is a broadcast medium. We have observed that the main difference between a distributed computing paradigm and a message passing parallel computing paradigm is that, in a distributed environment the activity of every processor is independent while in a parallel environment the collection of the user-communication layers in the processors can be modeled as a single global program. We have formalized the requirements by defining the notion of a correct global program. This notion provides a precise specification of the interface between the transport layer and the user-communication layer. We have developed PCODE, a new communication protocol that is driven by a global program, and proved its correctness. We have implemented the PCODE protocol on a collection of IBM RS/6000 workstations and on a collection of Silicon Graphics Indigo workstations, both communicating via UDP broadcast. The experimental results we obtained indicate that the performance advantage of PCODE over the current point-to-point approach (TCP) can be as high as an order of magnitude on a cluster of 16 workstations.

Original languageEnglish
Pages (from-to)130-139
Number of pages10
JournalIEEE Symposium on Parallel and Distributed Processing - Proceedings
StatePublished - 1995
Externally publishedYes
EventProceedings of the IEEE 9th International Parallel Processing Symposium - Santa Barbara, CA, USA
Duration: 25 Apr 199528 Apr 1995

Fingerprint

Dive into the research topics of 'PCODE: an efficient and reliable collective communication protocol for unreliable broadcast domains'. Together they form a unique fingerprint.

Cite this