Towards Efficient and Reliable Data Center Systems
Cloud computing plays a critical role in providing computing resources to many organizations. The relentless of the need for cloud service makes reliability and efficiency two primary metrics of interest. However, the existing data center system design falls short on these two goals. Specifically, (1) operating systems have significant overheads in providing virtualization support to cloud applications; (2) network infrastructure incurs excessive cost; (3) infrastructure problems are notoriously difficult to debug and mitigate.
I will cover three projects for tackling these problems. My main focus of the talk is Slim, an efficient network stack design for container virtualization. Unlike traditional container networking approaches that rely on packet-based network virtualization, Slim virtualizes the network at a per-connection level, lowering the overheads of the operating system. Slim results in 11-66% CPU utilization reduction on popular cloud applications, such as Memcached, Nginx, PostgreSQL, and Apache Kafka. I will then briefly touch on CorrOpt, a system that reduces packet corruption loss in the data center networks by three to six orders of magnitude, and RAIL, a data center network architecture that reduces the total cost of the network by up to 44%. At the end of the talk, I will discuss future trends in the data center system space.
Danyang Zhuo is a sixth-year Ph.D. student in Computer Science and Engineering at the University of Washington, where he is advised by Tom Anderson and Arvind Krishnamurthy. His work spans computer systems and networking, and his recent work focuses on improving the reliability and efficiency of cloud operating systems and data center network architectures. Before starting at the University of Washington, he received his bachelor degree in Electrical Engineering at the University of Illinois Urbana Champaign.