next up previous
Next: Trapeze Messaging and NetRPC Up: Overview of Trapeze and Previous: Overview of Trapeze and

The Slice Block I/O Service

The Slice block I/O service is built from a collection of PCs, each with a handful of disks and a high-speed network interface. We call this approach to network storage ``PCAD'' (PC-attached disks), indicating an intermediate approach between network-attached disks (NASD) and conventional file servers (server-attached disks, or SAD). While the CMU NASD group has determined that SAD can add up to 80% to the cost of disk capacity [9], it is interesting to note that the cost of the CPU, memory, and network interface in PCAD is comparable to the price differential between IDE and SCSI storage today. Our current IDE PCAD nodes serve 88 GB of storage on four IBM DeskStar 22GXP drives at a cost under $60/GB, including a PC tower, a separate Promise Ultra/33 IDE channel for each drive, and a Myrinet NIC and switch port. With the right software, a collection of PCAD nodes can act as a unified network storage volume with incrementally scalable bandwidth and capacity, at a per-gigabyte cost equivalent to a medium-range raw SCSI disk system.[*] Moreover, the PCAD nodes feature a 450 MHz Pentium-III CPU and 256MB of DRAM, and are sharable on the network.

The chief drawback of the PCAD architecture is that the I/O bus in the storage nodes limits the number of disks that can be used effectively on each node, presenting a fundamental obstacle to lowering the price per gigabyte without also compromising the bandwidth per unit of capacity. Our current IDE/PCAD configurations use a single 32-bit 33 MHz PCI bus, which is capable of streaming data between the network and disks at 40 MB/s. Thus the PCAD/IDE network storage service delivers only about 30% of the bandwidth per gigabyte of capacity as the SCSI disks of equivalent cost. Even so, the bus bandwidth limitation is a cost issue rather than a fundamental limit to performance, since bandwidth can be expanded by adding more I/O nodes, and bus latencies are insignificant where disk accesses are involved.

In our Slice prototype, the block I/O servers run FreeBSD 4.0 kernels supplemented with a loadable module that maps incoming requests to collections of files in dedicated local file systems. Slice includes features that enable I/O servers to act as caches over NFS file servers, including tertiary storage servers or Internet gateways supporting the NFS protocol [2]. In other respects, the block I/O protocol is compatible with NASD, which is emerging as a promising storage architecture that would eliminate the I/O server bus bottleneck.

One benefit of PCAD I/O nodes is that they support flexible use of network memory as a shared high-speed I/O cache integrated with the storage service. Trapeze was originally designed as a messaging substrate for the Global Memory Service (GMS) [6], which supports remote paging and cooperative caching [5] of file blocks and virtual memory pages, unified at a low level of the operating system kernel. The GMS work showed significant benefits for network memory as a fast temporary backing store for virtual memory or scratch files. The Slice block I/O service retains the performance emphasis of the network memory orientation, and the basic protocol and mechanisms for moving, caching, and locating blocks are derived from GMS. We are investigating techniques for using the I/O server CPU to actively manage server memory as a prefetch buffer, using speculative prediction or hinting directives from clients [13].


next up previous
Next: Trapeze Messaging and NetRPC Up: Overview of Trapeze and Previous: Overview of Trapeze and
Jeff Chase
8/4/1999