NIMBLE Task Scheduling for Serverless Analytics
Serverless platforms facilitate transparent resource elasticity and fine-grained billing, making them an attractive choice for data analytics. We find that while schedulers in server-centric analytics frameworks typically optimize for job runtime, resource utilization and isolation via inter-job scheduling policies, serverless analytics requires them to optimize for job runtime and cost of execution instead, introducing a new task-level scheduling problem. We present NIMBLE, a fine-grained scheduling algorithm to solve this problem. By launching each task at exactly the right time, NIMBLE efficiently pipelines task executions within a job, minimizing execution cost while being Pareto-optimal between cost and runtime for arbitrary analytics jobs. To enable NIMBLE scheduling in practice, we build Caerus, a fine-grained task-level scheduler for serverless analytics frameworks. Our evaluation results show that in practice, Caerus is able to achieve both optimal cost and runtime for queries across a wide range of analytics workloads.
Hong Zhang is a postdoc researcher at RISELab working with Prof. Ion Stoica. Hong is broadly interested in job scheduling and network scheduling problems to improve application performance. He received a Google PhD Fellowship in Systems and Networking.