HOME PAGE FOR VMI21

[NCSA]
Virtual Machine Interface 2.1

VMI and MPICH-VMI Bandwidth


The VMI bandwidth data graphed below for each kind of interconnect was measured using the bandwidth benchmark installed in the benchmarks directory of VMI install tree. bandwidth measures the through-put of an interconnect for a given message size s using the VMI messaging interface by starting a clock and streaming n messages, each of size s, to its peer bandwidth process running on a different host. The peer process, after receiving n messages, sends back an acknowledgement. After the acknowledgement is received, bandwidth stops the clock and computes the time spent on streaming n*s bytes of data. This time is divided by n to get the throughput. For each value of bandwidth plotted on the graphs below for a given message size, a total of 1024 message were streamed over the interconnect. bandwidth can use both the stream and RDMA messaging interface of VMI for communications. For the graphs plotted below, stream interface has been used for messages less then 16k. For messages over 16k RDMA has been used.

The MPICH-VMI bandwidth data was measured using bandwidth_mpi installed in the tools directory of MPICH-VMI install tree. bandwidth_mpi measures the bandwith the same way as bandwidth except that it uses MPI's send/recv interface to communicate data. bandwidth_mpi can run in both blocking (MPI_Send/MPI_Recv) and non-blocking (MPI_Isend/MPI_Irecv) modes to measure the bandwidth.

If you look at the bandwidth curves for MPICH-VMI, you will notice they knee a a little at 16k message size before ascending again. This knee occurs because MPICH-VMI, by default, switches from eager protocol to rendezvous protocol for sending/receiving data at 16k (The default message length over which rendezvous is used can also be set by the user on command line). In eager protocol, the sender sends the data immediately. If a receive is not posted by the destination, some space is allocated by the messaging layer to store the message. In rendezvous protocol, however, the sender sends the data only after a corresponding receive has been posted by the receiver. So every rendezvous send/recv requires a handshake between the sender and the receiver that causes some overhead. The knees in the graphs are caused by this overhead. Higher the latency of the inteconnect, greater is the overhead. That is why Infiniband, with the smallest latency among the interconnnects used, is affected the least due to this overhead, while TCP and Myrinet with relatively higher latencies show a greater affect. The adverse affect of latency over bandwidth would have been worse if rendezvous had been implemented using stream instead of RDMA in MPICH-VMI. Unlike stream, that is used to implement eager protocol, RDMA is more efficient for large messages, since it avoids a memory copy by depositing data directly into the receive buffer of the receiver. Using RDMA, therefore, tends to mitigate the adverse affect of latency on bandwidth in rendezvous protocol.

IA-32 Bandwith Measurements


To meaure bandwidth between IA-32 machines, we used 2 Dell PowerEdge 1750 servers with 2 hyperthreaded Intel Xeon 3.06 GHz processors (3 GB RAM, 512k L2 cache) running Linux 2.4.20-30smp kernels for all bandwidth measurements.

For Myrinet, both machines had LANai 10 on 64 bit PCIX 100MHz bus with gm-2.0.11 driver.

Msg Size (Bytes) VMI Bandwidth (MB/sec) MPI Blocking (MB/sec) MPI Non-blocking (MB/sec)
1 0.0845 0.163 0.085
2 0.1678 0.328 0.341
4 0.3376 0.657 0.682
8 0.6766 1.315 1.364
16 1.3502 2.629 2.729
32 2.6968 5.24 5.454
64 5.3951 10.43 10.828
128 10.65 20.678 21.217
256 21.123 40.803 41.578
512 41.264 78.144 79.749
1024 79.382 145.538 148.45
2048 139.97 233.416 231.835
4096 176.83 241.168 241.426
8192 205.23 244.18 244.588
16384 235.49 159.145 217.583
32768 235.9 193.7 231.558
65536 235.67 217.001 239.167
131072 235.7 231.389 243.169
262144 235.79 239.016 245.212
524288 235.81 242.706 246.253
1048576 235.83 244.717 246.776
2097152 235.84 245.999 247.042
4194304 236.3 246.587 247.143

For Infiniband, both machines had InfiniHost MT23108 HCA (a1 silicon) on 64 bit PCIX 133MHz bus with thca-3.1.

Msg Size (Bytes) VMI Bandwidth (MB/sec) MPI Blocking (MB/sec) MPI Non-blocking (MB/sec)
1 0.3492 0.278
2 0.6766 0.572
4 1.3922 1.108
8 2.7463 2.286
16 5.5325 4.502
32 10.997 8.714
64 20.959 17.719
128 42.756 34.287
256 85.513 66.564
512 0.5306 131.12
1024 267.18 219.557
2048 419.93 348.717
4096 565.87 495.654
8192 662.24 596.254
16384 732.09 371.603
32768 820.48 520.971
65536 836.45 651.237
131072 844.29 744.406
262144 848.13 800.506
524288 850.18 832.605
1048576 851.1 849.369
2097152 851.53 850.529
4194304 679.62 856.3

For Ethernet, Broadcom Gigabit Ethernet NIC with bcm5700 driver was used.

Msg Size (Bytes) VMI Bandwidth (MB/sec) MPI Blocking (MB/sec) MPI Non-blocking (MB/sec)
1 0.0203 0.007 0.028
2 0.0445 0.083 0.05
4 0.0866 0.165 0.1
8 0.1734 0.258 0.2
16 0.3497 0.66 0.401
32 0.6935 1.303 0.796
64 1.3801 2.629 6.304
128 3.043 5.239 0.593
256 25.037 10.224 22.993
512 47.758 14.41 45.102
1024 92.538 37.541 83.858
2048 109.05 71.416 104.601
4096 114.24 94.666 107.392
8192 104.77 106.604 109.259
16384 116.5 40.453 106.017
32768 117.18 61.655 109.143
65536 117.35 81.002 110.241
131072 117.47 97.749 110.598
262144 117.51 108.212 110.526
524288 117.28 115.119 110.787
1048576 117.3 118.646 110.855
2097152 117.46 120.778 110.79
4194304 117.4 121.899 87.899

IA-64 Bandwith Measurements


To meaure bandwidth between IA-64 machines over Myrinet, we used 2 Intel Itanium 2 duals with 1.3 Ghz processors (4 GB RAM) running Suse SLES8 system with 2.4.21 kernel. Infiniband, GigE and shared memort measurements were done with 2 Intel Itanium 2 duals with 900 Mhz processors (2 GB RAM) running 2.4.21 kernel.

For Infiniband, both machines had InfiniHost MT23108 HCA (a1 silicon) on 64 bit PCIX 133MHz bus with thca-3.1.

Msg Size (Bytes) VMI Bandwidth (MB/sec) MPI Blocking (MB/sec) MPI Non-blocking (MB/sec)
1 0.2749 0.126
2 0.0022 0.249
4 0.0034 0.504
8 2.2197 1.008
16 0.0142 2.056
32 8.8894 4.116
64 0.0568 8.16
128 34.738 16.233
256 0.2273 32.258
512 131.93 64.598
1024 237.66 126.576
2048 376.9 245.42
4096 506.84 426.363
8192 596.1 604.712
16384 646.97 286.409
32768 699.71 413.549
65536 711.33 530.729
131072 717.33 617.617
262144 720.28 672.495
524288 721.83 703.915
1048576 722.59 720.775
2097152 722.65 725.168
4194304 694.7 730.602

For Myrinet, both machines had LANai 10 on 64 bit PCIX 100MHz bus with gm-2.0.8 driver.

Msg Size (Bytes) VMI Bandwidth (MB/sec) MPI Blocking (MB/sec) MPI Non-blocking (MB/sec)
1 0.1047 0.132 0.114
2 0.2092 0.27 0.295
4 0.4289 0.554 0.573
8 0.8232 1.096 1.149
16 1.6982 2.23 2.31
32 3.398 4.475 4.656
64 6.7794 8.901 9.258
128 13.467 15.448 18.591
256 26.407 34.317 35.571
512 51.713 64.598 67.431
1024 92.723 117.783 118.097
2048 173.39 209.235 220.787
4096 179.62 215.195 220.571
8192 206.5 228.413 227.688
16384 213.58 133.565 196.538
32768 219.16 166.038 214.736
65536 205.01 181.863 219.142
131072 213.43 207.196 214.132
262144 211.57 212.335 219.345
524288 211.82 216.781 220.946
1048576 211.12 219.384 222.382
2097152 212.41 220.236 222.128
4194304 211.49 221.971 222.305

Broadcom Gigabit Ethernet NIC was bcm5700 driver was used for Gigabit ethernet.

Msg Size (Bytes) VMI Bandwidth (MB/sec) MPI Blocking (MB/sec) MPI Non-blocking (MB/sec)
1 0.0179 0.03 0.054
2 0.0359 0.06 0.112
4 0.0716 0.118 0.228
8 0.1438 0.248 0.454
16 0.284 0.485 1.01
32 0.5787 0.98 1.982
64 1.1192 1.989 3.887
128 2.2474 4.013 7.685
256 15.324 8.192 14.836
512 26.46 16.447 28.399
1024 46.743 28.274 52.083
2048 74.498 53.326 80.965
4096 108.42 83.692 114.003
8192 110.33 115.197 115.916
16384 111.26 47.818 109.828
32768 111.71 69.08 115.087
65536 111.97 86.34 110.566
131072 112.09 91.262 102.668
262144 105.74 104.601 114.484
524288 112.09 107.393 115.548
1048576 108.86 113.449 112.563
2097152 112.21 115.11 111.775
4194304 111.67 115.923 112.334

Shared memory.

Msg Size (Bytes) MPI Blocking (MB/sec) MPI Non-blocking (MB/sec)
1 1.12
2 1.852
4 4.106
8 8.501
16 16.857
32 31.678
64 69.705
128 127.492
256 246.667
512 421.406
1024 662.701
2048 878.586
4096 1013.322
8192 1120.962
16384 869.504
32768 1012.076
65536 1109.202
131072 1171.698
262144 1211.436
524288 1191.547
1048576 689.272
2097152 569.463
4194304 542.517

Back to the VMI2 web page.

 


[NCSA]