next up previous
Next: Scalable Mining for Classification Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Competitive Analysis of Buffer

   
Modeling and Optimizing I/O Throughput of Multiple Disks on a Bus

R. D. Barve, E. A. M. Shriver, P. Gibbons, B. Hillyer, Y. Matias, and J. S. Vitter. ``Modeling and optimizing I/O throughput of multiple disks on a bus,'' Appeared as a short paper in Joint International Conference on Measurement and Modeling of Computer Systems (Sigmetrics '98/Performance '98), 264-265. A longer version is currently under submission. Our modeling and optimization work forms the basis of two patent applications currently filed and pending with the patent office.

Full text (gzip-compressed postscript)

Slides for talk (gzip-compressed postscript)

For a wide variety of computational tasks, disk I/O continues to be a serious obstacle to high performance. To meet demanding I/O requirements, systems are designed to use multiple disk drives that share one or more I/O ports to form a disk farm or RAID array. The focus of the present paper is on systems that use multiple disks per SCSI bus. We measured the performance of concurrent random I/Os for three types of SCSI disk drives and three types of computers. The measurements enable us to study bus-related phenomena that impair performance. We describe these phenomena, and present a new I/O performance model that incorporates bus effects to predict the average throughput achieved by concurrent random I/Os that share a SCSI bus. This model, although relatively simple, predicts performance on these platforms to within 11% for fixed I/O sizes in the range 16-128 KB/s. We then describe a technique to improve the I/O throughput. This technique increases the percentage of disk head positioning time that is overlapped with data transfers, and increases the percentage of transfers that occur at bus bandwidth, rather than at disk-head bandwidth. Our technique is most effective for large I/Os and high concurrency--an important performance region for large-scale computing--our improvements are 10-20% better than the naive method for random workloads.


next up previous
Next: Scalable Mining for Classification Up: EXTERNAL MEMORY ALGORITHMS, I/O Previous: Competitive Analysis of Buffer
Jeff Vitter
2008-04-02