Data analytics frameworks on shared clusters host a large number of diverse workloads submitted by multiple tenants. Modern cluster schedulers incentivize users to share the cluster resources by promising fairness and isolation along with high performance and resource utilization. Nevertheless, it is hard to meet these guarantees as resource contentions among such collocated workloads cause significant performance issues and is one of the key reasons for unpredictable performance and missed workload Service-Level-Agreements (SLAs) in data analytics frameworks.
Upcoming Student Events
Recent years have seen unprecedented growth in the volume, velocity, and variety of the data managed by data analytics platforms. At the same time, the skilled IT staff required to develop and operate the datacenters are going up at a much smaller pace. This trend suggests a big interest in making the data analytics platforms more autonomic/self-driving. There are, however, several major challenges in this task.