HOME PAGE FOR VMI21

[NCSA]
Virtual Machine Interface 2.1

Comparison of MPI Broacast using Binomial Tree Algorithm vs. Multicast


VMI 2.0 final release contains an experimental multicast framework to allow unreliable multicast when using Infiniband and TCP interconnects. Infiniband device uses the unreliable datagram (UD) based multicast service provided by the infiniband hardware. Since the maximum UD packet is around 2K, the Infiniband device handles multicast messages larger than 2k by send the message in chunks of 2K or less and reassembling them at the receiver's end.

The TCP device exploits IP multicast to provide unreliable multicast to VMI core, allowing multicast of messages of up to 64K (unreliable datagram size).

Since the multicast support exported by VMI is unreliable, MPICH-VMI, that sits atop VMI, implements a reliablity layer to exploit multicast for collective operations such as Broadcast. This reliability layer adopts a receiver-side NACK-based approach. In this approach, each multicast receiver has a certain initial timeout value. When the timeout expires, the receives sends a message to the sender over the reliable channel, requesting it to re-send the multicast message over the reliable channel. The timeout value, is however adaptive, in that, if the receiver does obtain the anticipated message over the multicast channel after the timout, the timeout value is increased.

Here are some graphs comparing MPI broadcast using default binomial tree algorithm with a multicast-based approach.

MPI Broadcast over VMI's Infiniband device







MPI Broadcast over VMI's TCP device (using IP multicast)


Back to the VMI2 web page.

 


[NCSA]