In distributed storage systems, the ability to recover from failures is critical for ensuring reliability. To improve recovery speed, these systems often distribute the recovery task across multiple disks and recover data units in parallel. However, the use of fine-grained data units for better load balancing can increase the risk of data loss.
This paper systematically analyzes the recovery load distribution problem and proposes a new data placement algorithm that can achieve load balancing without employing fine-grained data units. The problem of finding an optimal data placement for recovery load balancing is formally defined and shown to be NP-hard. A greedy data placement algorithm is presented, and experimental results demonstrate its superior performance compared to conventional techniques, with up to 2.4 times faster recovery. Furthermore, the algorithm supports low-overhead system expansion.