We are currently using our Trapeze API and firmware in a normal IP device driver on Digital UNIX and FreeBSD. The results reported here were obtained using an unmodified copy of Netperf v2.1pl2.
All of the results reported except those highlighted in red as zero-copy results were obtained with unmodifed socket and protocol layers. The zero-copy modifications we've made to the network stack have been to avoid copyout() at the socket layer on receives via a locally implemented page-renaming scheme which avoids any semantic changes to the socket interface. To avoid touching the data in the zero-copy case, we rely on the Myrinet hardware checksums and skip TCP checksums. As noted above, entries which were obtained using these optimizations are clearly marked in red.
| Machine -> Machine | Protocol | OS | Result |
|---|---|---|---|
| AS500/266+PWS500au+PWS500au -> PWS500au | TCP | Digital UNIX 4.0c with local zero-copy mods to soreceive() and tcp_input() | 732 Mbits/sec |
| AS500/266 -> PWS500au | TCP | Digital UNIX 4.0c with local zero-copy mods to soreceive(), sosend, tcp_input and tcp_output() | 690 Mbits/sec | PWS500au -> PWS500au | UDP | Digital UNIX 4.0d | 644 Mbits/sec | PWS500au -> PWS500au | TCP | Digital UNIX 4.0d | 525 Mbits/sec | AS500/266 -> P.Pro @200Mhz | UDP | Digital UNIX 4.0/FreeBSD-2.2.2 | 434 Mbits/sec | P.Pro @200Mhz -> P.Pro @200Mhz | UDP | FreeBSD 2.2.2 | 378 Mbits/sec | P.Pro @200Mhz -> P.Pro @200Mhz | TCP | FreeBSD 2.2.2 | 309 Mbits/sec |
The results listed above was obtained with Netperf v2.1pl2, the current development version of the driver and Traepze API, with the traffic going through an 8 port Myrinet switch (M2F-SW8) with other active connections. The Myrinet interface cards were 32-bit M2F-PCI32s. All tests were 60 seconds in duration. UDP measurements were taken at the receiver. The following kernel tuning parameters were changed from their defaults:
tcp_sendspace = 65536, tcp_recvspace = 65536, udpcksum = 0| Machine | Protocol | OS | 1b | 64b | 1K | 2K | 4K | 8K | PWS500au -> PWS500au | UDP | Digital UNIX 4.0d | 67 us | 68 us | 125 us | 152 us | 202 us | 298 us | PWS500au -> PWS500au | TCP | Digital UNIX 4.0d | 60 us | 63 us | 123 us | 150 us | 206 us | 314 us |
|---|
Andrew Gallatin wrote the "TCP/IP" Myrinet device driver for Digital Unix, now distributed through the Myricom site. This driver is an extension of the mmap driver developed by Dirk Grunwald's group at the University of Colorado. The TCP/IP driver is still being tested and refined. However, we're happy to report that according to the Netperf benchmark , IP over Myrinet on a DEC Alpha running Digital UNIX is the one of the fastest IP implementations currently available.
The results listed below were obtained with Netperf v2.1pl2,
version 3.09d of the driver, with the 2 machines directly connected.
The Myrinet interface cards were 32-bit M2F-PCI32s. The PWS500au was on
loan from our local DEC sales representative. All tests were 60
seconds in duration. The following kernel tuning parameters were changed from
their defaults:
| Machine | LANai | MyriAPI Layer | Result |
|---|---|---|---|
| AS500/266 to Digital PWS 500au | LANai4.1 | TCP (Digital UNIX 4.0/4.0c) | 300.74 Mbits/sec (receiver) |
| DEC AlphaStation 500/266 | LANai4.1 | UDP (Digital UNIX 4.0), Checksums OFF | 484.56 Mbits/sec (transmitter) |
| Digital PWS 500au | LANai4.1 | UDP (Digital UNIX 4.0c), Checksums OFF | 477.93 Mbits/sec (receiver) |
| DEC AlphaStation 500/266 | LANai4.1 | UDP (Digital UNIX 4.0), Checksums ON | 404.89 Mbits/sec (transmitter) |
| Digital PWS 500au | LANai4.1 | UDP (Digital UNIX 4.0c), Checksums ON | 404.65 Mbits/sec (receiver) |
The results listed below were obtained with Netperf v2.1pl1, version 3.09c of the driver, with the traffic going through an 8 port Myrinet switch (M2F-SW8) with other active connections. The Myrinet interface cards were 32-bit M2F-PCI32s. All tests were 60 seconds in duration. The following kernel tuning parameters were changed from their defaults:
tcp_sendspace = 65536, tcp_recvspace = 65536, udpcksum = 0
| Machine | LANai | MyriAPI Layer | Result |
|---|---|---|---|
| DEC AlphaStation 500/266 | LANai4.1 | TCP (Digital UNIX 4.0) | 271.35 Mbits/sec |
| DEC AlphaStation 500/266 | LANai4.1 | UDP (Digital UNIX 4.0) | 287.79 Mbits/sec (receiver) |
| DEC AlphaStation 500/266 | LANai4.1 | UDP (Digital UNIX 4.0) | 468.50 Mbits/sec (transmitter) |
According to profiling information gathered with DCPI, the bottlenecks in the current setup are copyin/copyout overhead, communications with the LANai, IP checksumming, and byte-swapping. Give a faster PCI bus implementation than is currently provided by the 21171 "Alcor" core logic chipset used in our AlphaStations, we would expect performance to increase. Due to the amount of time spent calculating checksums and byte-swapping, we would also expect improved performance from other Alcor based machines with higher CPU clock rates.
Andrew Gallatin of Duke's ARI group and Anne Hutton of the Atomic-2 project are collaborating on a combined FreeBSD and NetBSD driver. This extends Andrew's earlier port of the Hebrew University BSD/OS driver to support FreeBSD/i386 in addition to NetBSD/alpha.
The FreeBSD/NetBSD driver has been tested under FreeBSD-2.2.{1,2,5}-RELEASE as well as NetBSD/alpha 1.3 and is available for download . This driver, running on a P6 can receive UDP traffic sent by a DEC Alpha at a sutained rate of 356Mbits/sec. as measured by NetPerf. Other performance figures:
| Machine | LANai | MyriAPI Layer | Result |
|---|---|---|---|
| Pentium Pro @200Mhz, 256k cache, Asus P/I-XP6NP5 | LANai4.1 | TCP (FreeBSD 2.2.2) | 275.45 Mbits/sec |
| Pentium Pro @200Mhz, 256k cache, Asus P/I-XP6NP5 | LANai4.1 | UDP (FreeBSD 2.2.2), with UDP checksums | 270.70 Mbits/sec (receiver) / 274.63 Mbits/sec (transmitter) |
| Pentium Pro @200Mhz, 256k cache, Asus P/I-XP6NP5 | LANai4.1 | UDP (FreeBSD 2.2.2), without UDP checksums | 280.28 Mbits/sec (receiver) / 282.38 Mbits/sec (transmitter) |
Last modified: Wed Dec 16 16:17:54 EST