Automating Memory Management in Data Analytics
Recent years have seen unprecedented growth in the volume, velocity, and variety of the data managed by data analytics platforms. At the same time, the skilled IT staff required to develop and operate the datacenters are going up at a much smaller pace. This trend suggests a big interest in making the data analytics platforms more autonomic/self-driving. There are, however, several major challenges in this task. Firstly, multiple `one-size' systems need to co-exist and co-operate in order to support a variety of computation needs such as log processing, business predictions, and real-time analysis. Secondly, cluster resources are managed at multiple levels exhibiting complex interactions between the many distributed system components. Lastly, multiple tenants share a cluster, each with specific performance expectations restricting opportunities for optimal use of resources.
We build an integrated management platform, called Thoth, that provides a data-centric view over the data analytics system environment. This platform is used to develop multiple auto-tuning algorithms to help systems meet their performance goals. We specifically focus on memory-based data analytics considering the growing sizes of---and effectively more aggressive use of---memory in data processing systems. Our first contribution is a cache manager targeted at multi-tenant cluster setups. It supports a novel fairness model providing guarantees to tenants on the performance speedups experienced by their workload. Our second contribution is automatic tuning of memory management decisions taken at multiple levels during an application execution. This problem is approached in two ways: (i) A relatively black-box modeling assisted with system internal knowledge, and (ii) An empirically-driven white-box approach. The two algorithms significantly improve the state-of-the-art tuning techniques, while exhibiting different trade-offs between the convergence guarantees and the speed of optimization.
We expect the work presented here act as a major step towards building self-driving data processing systems, motivating further work in automating components such as physical design and root cause analysis.