Out-of-core random walk system has recently attracted a lot of attention as an economical way to run billions of walkers over large graphs. However, existing out-of-core random walk systems are all built upon general out-of-core graph processing frameworks, and hence do not take advantage of the unique properties of random walk applications. Different from traditional graph analysis algorithms, the sampling process of random walk can be decoupled from the processing of the walkers. It enables the system to reserve only pre-sample results in memory, which are typically much smaller than the entire edge set. Moreover, in random walk, it is not the number of walkers but the number of steps moved per second that dominates the overall performance. Thus, with independent walkers, there is no need to process all the walkers simultaneously.
In this paper, we present NosWalker, an out-of-core random walk system that replaces the graph oriented scheduling with a decoupled system architecture that provides walker oriented scheduling. NosWalker is able to adaptively generate walkers and flexibly adjust the distribution of reserved pre-sample results in memory. Instead of processing all the walkers at once, NosWalker only tries its best to keep a few walkers able to continuously move forward. Experimental results show that NosWalker can achieve up to two orders of magnitude speedup compared to state-of-the-art out-of-core random walk systems. In particular, NosWalker demonstrates superior performance when the memory capacity can only hold about 10%-50% of the graph data, which can be a common case when the user needs to run billions of walkers over large graphs.