next up previous
Next: Balancing Latency and Bandwidth Up: Overview of Trapeze and Previous: The Slice Block I/O

Trapeze Messaging and NetRPC

Trapeze messages are short (128-byte) control messages with optional attached payloads typically containing application data not interpreted by the messaging system, e.g., file blocks, virtual memory pages, or TCP segments. The data structures in NIC memory include two message rings, one for sending and one for receiving. Each message ring is a circular producer/consumer array of 128-byte control message buffers and related state, shown in Figure 1. The host attaches a payload buffer to a message by placing its DMA address in a designated field of the control message header.

The Trapeze messaging system has several features useful for high-speed network storage access:

The NetRPC package based on Trapeze is derived from the original RPC package for the Global Memory Service (gms_net), which was extended to use Trapeze with zero-copy block handling and support for asynchronous prefetching at high bandwidth [1].

To complement the zero-copy features of Trapeze, the socket layer, TCP/IP driver, and NetRPC share a common pool of aligned network payload buffers allocated from the virtual memory page frame pool. Since FreeBSD exchanges file block buffers between the virtual memory page pool and the file cache, this allows unified buffering among the network, file, and VM systems. For example, NetRPC can send any virtual memory page or cached file block out to the network by attaching it as a payload to an outgoing message. Similarly, every incoming payload is deposited in an aligned physical frame that can mapped into a user process or hashed into the file cache or VM page cache. This unified buffering also enables the socket layer to reduce copying by remapping pages, which significantly reduces overheads for TCP streams [7].

Figure 2: Adaptive message pipelining policy and resulting pipeline transfers.

\epsfig {file = figs/cutthroug...

\epsfig {file = figs/act.eps, height=1.2in}


High-bandwidth network I/O requires support for asynchronous block operations for prefetching or write-behind. NFS clients typically support this asynchrony by handing off outgoing RPC calls to a system I/O daemon that can wait for RPC replies, allowing the user process that originated the request to continue. NetRPC supports a lower-overhead alternative using nonblocking RPC, in which the calling thread or process supplies a continuation procedure to be executed -- typically from the receiver interrupt handler -- when the reply arrives. The issuing thread may block at a later time, e.g., if it references a page that is marked in the I/O cache for a pending prefetch. In this case, the thread sleeps and is awakened directly from the receiver interrupt handler. Nonblocking RPCs are a simple extension of kernel facilities already in place for asynchronous I/O on disks; each network I/O operation applies to a buffer in the I/O cache, which acts as a convenient point for synchronizing with the operation or retrieving its status.

next up previous
Next: Balancing Latency and Bandwidth Up: Overview of Trapeze and Previous: The Slice Block I/O
Jeff Chase