Disk I/O is the major performance bottleneck of existing out-of-core graph processing systems. We found that the total I/O amount can be reduced by loading more vertices into memory every time. Although task partitioning of a graph processing system is traditionally considered equivalent to the graph partition problem, this assumption is untrue for many Machine Learning and Data Mining (MLDM) problems: instead of a single value, a vector of data elements is defined as the property for each vertex/edge. By dividing each vertex into multiple sub-vertices, more vertices can be loaded into memory every time, leading to less amount of disk I/O. To explore this new opportunity, we propose a category of 3-D partitioning algorithm that considers the hidden dimension to partition the property vector. The 3-D partitioning algorithm provides a new tradeoff to reduce communication costs, which is adaptive to both distributed and out-of-core scenarios. Based on it, we build a distributed graph processing system CUBE and an out-of-core system SINGLECUBE. Since network traffic is significantly reduced, CUBE outperforms state-of-the-art graph-parallel system PowerLyra by up to 4.7X. By largely reducing the disk I/O amount, the performance of SINGLECUBE is significantly better than state-of-the-art out-of-core system GridGraph (up to 4.5X).