As computational resources become more efficient and data sizes grow, data movement is fast becoming the dominant cost in computing. Processing-in-Memory is emerging as a key technique for reducing costly data movement, by enabling computation to be executed on compute resources embedded in the memory modules themselves.
This paper presents the Processing-in-Memory (PIM) model, for the design and analysis of parallel algorithms on systems providing processing-in-memory modules. The PIM model focuses on keys aspects of such systems, while abstracting the rest. Namely, the model combines (i) a CPU-side consisting of parallel cores with fast access to a small shared memory of size M words (as in traditional parallel computing), (ii) a PIM-side consisting of P PIM modules, each with a core and a local memory of size Θ(n/P) words for an input of size n (as in traditional distributed computing), and (iii) a network between the two sides. The model combines standard parallel complexity metrics for both shared memory (work and depth) and distributed memory (local work, communication time) computing. A key algorithmic challenge is to achieve load balance among the PIM modules in both their communication and their local work, while minimizing the communication time. We demonstrate how to overcome this challenge for an ordered search structure, presenting a parallel PIM-skiplist data structure that efficiently supports a wide range of batch-parallel queries and updates.