CPS 210
Operating Systems

Project proposals are due on February 20. A "proposal" is short (4-page) writeup in an e-mail message to me defining your goals and a few dated milestones. In some cases we may approve joint projects with CPS 214 or CPS 296, provided that the project is sufficiently meaty and combines elements that are relevant to both CPS 210 and one of the other courses. Projects may be conducted in groups. I prefer pairs, but I will accept a group as large as three people.

Good project opportunities exist in any area that is of interest to you and your group. Some simple options are to extend Nachos in some substantial way (e.g., add a file system), add some useful piece of functionality to Linux or FreeBSD, or download software relating to a research paper and reproduce or extend the results. These kinds of projects are nice because they are well-defined and you can get started early, but they aren't likely to yield research results if that is your goal.

Here are some ideas for projects that can contribute to our existing research efforts. Any of these could lead to publishable results if executed well.

OS event visualization

We have a FreeBSD kernel facility (netlog) that enables logging of internal OS kernel events to a server on a high-bandwidth network, timestamped at a granularity of nanoseconds. An associated tools suite parses the standard log format for display by programs such as gnuplot or mathematica. By instrumenting selected points in the kernel, one can create pictures of kernel activity to illustrate internal algorithms and performance behavior (I included some of these pictures in the slides posted for 2/11). The trick here is to exercise some creativity in determining where/how to instrument the kernel and how to display the resulting data. This is a good way to learn something about kernel internals and the dynamic behavior of OS kernels. It would be easy to port netlog to Linux if that is your interest.

Secure file sharing across administrative domains

I have some ideas for secure, efficient, and transparent sharing of files across organizations using HTTP and encryption-based access control. See me or send e-mail if you are interested. This is a good project to combine with CPS 214. This would involve working with file service protocols and a Web proxy.

File server performance studies with fstress

Fstress is a configurable, scalable file server benchmark based on the NFS protocol. An excellent project would be to use Fstress as a basis for comparisons of Linux-based vs. FreeBSD-based NFS servers, and/or to identify performance properties of various file service implementations. The ISSG lab has a Linux-based NAS server with 1.6 TB of storage across 24 drives, and a FibreChannel link to an 8-disk volume on Duke's IBM Shark server in the North Building. Also, IBM has committed to donate an implementation of GPFS, a cluster file system for Linux. I can think of a variety of interesting studies based on fstress to compare different filesystems (ext2fs, JFS, GPFS) under different types of NFS service loads. One option is to study the latency-tolerance of file services when "disk" is remote (e.g., a FibreChannel or iSCSI volume); there is a great deal of interest in this issue with the rise of IP-based (iSCSI) storage access. Another option is to explore the performance differences between different RAID layouts.

Another set of possible projects would extend fstress itself by studying some "interesting" file system workload, e.g., SPECmail or Web server access traces, and extracting fstress parameter settings to model that workload synthetically.

Direct Access File System

We have a public-domain implementation of a subset of the new DAFS file system, as a result of a collaboration with Network Appliance and Margo Seltzer's group at Harvard (this is the basis for our paper in Usenix this June). We also have a Network Appliance filer supporting a prerelease of NetApp's DAFS server software, and Emulex GN9000 NICs supporting a VI host interface over TCP/IP. These pieces could be the start of a variety of interesting projects. This is very "hands on" stuff for somebody interested in high-performance network I/O. We need somebody to subject the GN9000 cards to VI messaging loads to understand the behavior of VI messaging and RDMA over TCP/IP; this is the first commercial implementation and there is a great deal of interest in these cards. Another project is to test the DAFS client against the DAFS filer, and compare DAFS performance against NFS using some of the latest systems in the ISSG lab, possibly by adding a DAFS module to fstress as a basis for server scalability studies. This involves some messy poking around in someone else's code (Richard Kisley's MS project), but if you take ownership of the code you can contribute significantly to the success of our ongoing research efforts with DAFS. Another possibility is to extend the DAFS client to enable it to transparently access multiple DAFS servers, while presenting a seamless view of a single name space to the client (this is similar to the Slice project).

Anypoint

Anypoint is a transport protocol extension and switch architecture to spread service load across a cluster of servers using content-based routing. An Anypoint switch examines each request as it arrives, redirects it to a server that is appropriate for handling that request, and merges the responses back into a single stream for the client, maintaining the client's view of a unified server of arbitrary power. One good project would implement a simple, toy, synthetic service using SOAP/HTTP or RMI, glue it to an Anypoint transport, implement an Anypoint Service Routing Module (SRM) for the routing policy, and use this as a basis for experimentation with Ken Yocum's Anypoint prototype.

ModelNet

ModelNet is software to allow Internet emulation in a cluster. There are a couple of ModelNet-related projects involving light kernel hacking in FreeBSD, and some experimentation. Somebody could do a detailed performance study of dummynet, a link emulation system that underlies ModelNet, built above the low level Internet Protocol code in FreeBSD. This would involve probing the limits of FreeBSD networking performance, and generating and displaying data about where it breaks. Another option is to instrument the ModelNet code to propagate status information about load or to log selected events using the netlog system outlined above. This could be the basis for dynamic balancing of network emulation responsibilities across ModelNet core nodes, and/or for visualizing the flow of data through the emulated Internetwork.

Your Idea Here

This page may continue to grow.