Bigger, Better, Faster, Stronger: Designing Networked Storage Systems for Hyper-scale Applications
Hyper-scale applications rely on fast access to vast amounts of data to power their machine learning and analytics algorithms. In order to support these applications, the next generation of distributed storage systems need to operate reliably at scale, provide real-time performance, and be cost effective by exploiting new storage technologies like non-volatile memory and fast flash. By rigorously modeling the performance of storage systems using real-world traces, we can design storage systems that can scale to millions of servers.
I present two examples of systems that adhere to this approach: Copysets and Bandana. Copysets is a replication framework based on combinatorial design theory that reduces the probability of data loss by over 10,000 times over random replication for the common scenario of simultaneous server failures. Bandana is an non-volatile memory system for storing deep learning models, which uses supervised learning to dynamically partition storage blocks so that objects accessed at the same time are also stored in the same physical location.
Asaf Cidon is the Vice President, Email Security at Barracuda Networks, and completed his PhD at Stanford under Mendel Rosenblum and Sachin Katti. His research focuses on how to build distributed storage systems that provide reliability and performance guarantees in hyper-scale cloud environments. His work was adopted by several companies and open source projects, including Facebook, Apache Ozone, Tibco, and CockroachDB. During his PhD, he founded and was the CEO of Sookasa, a cloud storage security startup, which was acquired by Barracuda Networks.