Scaling Asynchronous Graph Query Processing via Partitioned Stateful Traversal Machines

Abstract

Due to the escalating demand to analyze large graphs, many organizations are now collecting billion-level property graph datasets, concurrently executing many complex graph queries against them, and expecting interactive-level response latency. However, such requirements are particularly challenging because of the notoriously irregular data access pattern and complex dependencies between heterogeneous subtasks. Despite the widespread availability of many-core CPUs and high-speed networking in modern datacenters, existing distributed graph query systems struggle with their inherent inefficiencies, resulting in low hardware utilization and poor query performance on these state-of-the-art hardware. To address these challenges, we introduce the Partitioned Stateful Traversal Machine (PSTM), which extends the Gremlin graph traversal machine. PSTM retains the expressive power of the Gremlin query language, enabling it to accommodate a wide range of graph query tasks, including traversal, pattern matching, filtering, and result aggregation. It additionally introduces query memoranda, allowing for more efficient implementation and execution of numerous graph queries in distributed environments. Moreover, PSTM facilitates various system-level optimizations, such as massively parallel execution, overlapping computation with communication, locality-aware data access, and lightweight progress tracking. Building upon PSTM, we develop GraphDance, a distributed graph database featuring an efficient asynchronous PSTM run-time. Our evaluations, conducted on an 8-node cluster, show that GraphDance achieves millisecond-level query latency for complex queries on terabyte-scale graphs, with an average latency reduction of 89.2% across all interactive complex queries in the LDBC SNB benchmark compared to existing distributed graph query systems.

Publication
The 41th IEEE International Conference on Data Engineering