Experiments with a prototype evaluate alternatives for controller policies in
the context of NFS server benchmarking using a configurable synthetic workload
generator. The prototype obtains saturation throughput (peak rate) measures
for NFS server configurations under various workloads, extending the widely
practiced NFSOPS standard measure for file server performance. We show that
how the automated controller can plan experiments to obtain peak rate measures
with a target confidence level and accuracy at low cost. Obtaining the peak
rate for a given workload and server configuration is a key building block for
systematic mapping of the performance behavior (response surface) across a
space of workloads and configurations. We illustrate how the controller can
employ established principles of response surface methodology to prune the
multi-dimensional sample space, and obtain peak rate measures more efficiently
by seeding the search from nearby points in the response surface. Details
This work outlines fundamental performance properties of transport
offload and other techniques for low-overhead I/O in terms of
four key ratios that capture the CPU-intensity of the application
and the relative speeds of the host, NIC device, and network path.
The study also reflects the role of offload as an enabler for direct
data placement, which eliminates some communication overheads
rather than merely shifting them to the NIC. The analysis applies to
Internet services, streaming data, and other scenarios in which end-to-end
throughput is limited by network bandwidth or processing
overhead rather than latency. (paper,
slides)
The two CPUs of the Alteon NIC raise an open challenge whether performance of user-level protocols can be improved by taking advantage of a multi-CPU NIC. To answer this challenge we parallelize and pipeline the basic EMP protocol. There are a lot of intrinsic issues associated with such parallelization and we explore different parallelization and pipelining schemes to enhance the performance of the basic EMP protocol. The performance results indicate that parallelizing the receive path of the protocol can deliver 964 Mbps of bandwidth, close to the maximum achievable on Gigabit Ethernet. To the best of our knowledge, this is the first research in the literature to exploit the capabilities of multi-CPU NICs to improve the performance of user-level protocols. Results of this research demonstrate significant potential to design scalable and high performance clusters with Gigabit Ethernet.Details