ActCap: Accelerating MapReduce on Heterogeneous Clusters with Capability-Aware Data Placement

Abstract

As a widely used programming model and implementation for processing large data sets, MapReduce performs poorly on heterogeneous clusters, which, unfortunately, are common in current computing environments. To deal with the problem, this paper: 1) analyzes the causes of performance degradation and identifies the key one as the large volume of inter-node data transfer resulted from even data distribution among nodes of different computing capabilities, and 2) proposes ActCap, a solution that uses a Markov chain based model to do node-capability-aware data placement for the continuously incoming data. ActCap has been incorporated into Hadoop and evaluated on a 24-node heterogeneous cluster by 13 benchmarks. The experimental results show that ActCap can reduce the percentage of inter-node data transfer from 32.9% to 7.7% and gain an average speedup of 49.8% when compared with Hadoop, and achieve an average speedup of 9.8% when compared with Tarazu, the latest related work.

Publication
IEEE Conference on Computer Communications, 2015, pp.1328-1336