Master's Defense

Diagnosing Performance Stragglers and Dynamically Allocating Executors for Spark

Speaker:Zhiyu Zhang
zyzhang at cs.duke.edu
Date: Thursday, April 13, 2017
Time: 2:00pm - 3:00pm
Location: D344 LSRC, Duke

Abstract

Spark is a useful framework that runs machine learning and graph applications. However, its performance is hard to manage and improve. In this paper, we present studies on profiling and management for Spark. First, we implement a framework that profiles both hardware and software metrics at task granularity as Spark runs its applications. We use correlation and elastic net methods to provide insights into the root causes for straggling Spark tasks. Second, we implement a scheduler that dynamically manages executors a max-min allocation policy, which produces a system in which users perform better and are more willing to share their executor resources.
Advisor(s): Benjamin Lee
Committee: Jun Yang, Debmalya Panigrahi