Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories
A key challenge for high performance distributed storage systems is balancing load in the presence of highly skewed and dynamic workloads. In this talk, I will present our recent work Pegasus, a new storage system that leverages programmable switch ASICs to balance load across storage servers. Pegasus selectively replicates a small set of popular objects, and builds a coherence directory in the switch data plane to track and manage the location of these replicated objects. This enables Pegasus to achieve load-aware forwarding and dynamic rebalancing, while still guaranteeing data coherence and consistency. I will show that the Pegasus design is both effective and practical: Pegasus improves the throughput of a distributed in-memory key-value store by more than 10x under a latency SLO, and consumes less than 3.5% of the total switch SRAM.
Jialin Li is an Assistant Professor in the School of Computing at the National University of Singapore. Before joining NUS, he received his PhD from the University of Washington in 2019 and his bachelor's degree from the University of Michigan in 2012. His current research interests are in the systems design for reconfigurable hardware, and co-designing distributed systems with data center networks. His research has been recognized with best paper awards at NSDI and OSDI.