Skip to content

Latest commit

 

History

History
executable file
·
161 lines (97 loc) · 3.72 KB

README.md

File metadata and controls

executable file
·
161 lines (97 loc) · 3.72 KB

NeutronBench is a GNN system evaluation framework built on NeutronStar.

🔧 Install

Dependencies

  • cmake (>=3.14.2).
  • mpich (>=3.3.3) for inter-process communication.
  • libnuma for NUMA-aware memory allocation.
  • cub for GPU-based graph propagation.
  • libtorch version > 1.7 with gpu support for nn computation.

Building

First clone the repository and initialize the submodule:

git clone https://github.com/iDC-NEU/NeutronBench.git
cd NeutronBench
git submodule update --init --recursive

# or just use one command
git clone --recurse-submodules https://github.com/iDC-NEU/NeutronBench.git

To build:

mkdir build && cd build
cmake ..
make -j 10

To run:

# This is an example (you need to prepare a data, refer to the dataset section below).
./run_nts.sh 1 ./cfgs/gcn_sample_demo.cfg 

📁 Datasets

All datasets we used:

Datasets Nodes Edges #F #L #hidden
Reddit 232.96K 114.85M 602 41 128
OGB-Arxiv 169.34K 2.48M 128 40 128
OGB-Products 2.45M 126.17M 100 47 128
OGB-Papers 111.06M 1.6B 128 172 128
Amazon 1.57M 264,34M 200 107 128
LiveJournal 4.85M 90.55M 600 60 128
Lj-large 7.49M 232.1M 600 60 128
Lj-links 5.2M 205.25M 600 60 128
Enwiki-links 13.59M 1.37B 600 60 128

we provide a python script to generate the data files:

# craete a python enviroments
conda create -n neutronbench python=3.9 -y
conda activate neutronbench

# instll python dependencies
pip install -r ./data/requirements.txt

# process the dataset
python ./data/generate_nts_dataset.py --dataset ogbn-arxiv

For graph datasets that lack ground-truth attributes, we randomly generate features and labels, and split the data into training (65%), validation (25%), and testing (10%) sets.

We provide Google Drive link for downloading the Amazon, LiveJournal, Lj-large, Lj-links, and Enwiki-links datasets.

🚀 Experiments

Data partitioning experiments

# partitioning
python ./exp/exp-partition/exp-partition.py

Batch preparation experiments

# batch size
python ./exp/exp-batch-size/exp-batch-size.py

# sample rate
python ./exp/exp-sample-rate/sample-rate.py

Data Transferring experiments

# data partitioning
python ./exp/exp-partition/exp-partition.py

# batch size
python ./exp/exp-batch-size/exp-batch-size.py

# different optimization
python ./exp/exp-diff-optim/exp-diff-optim.py

# hybrid transfer
python ./exp/exp-hybrid-trans/exp-hybrid-trans.py

# pipeline
python ./exp/exp-diff-optim/exp-diff-pipe.py

# gpu cache 
python ./exp/exp-gpu-cache/exp-gpu-cache.py

📜Reference

If you find NeutronBench useful or relevant to your research, please cite our paper as below:

@article{yuan2024comprehensive,
  author       = {Hao Yuan and Yajiong Liu and Yanfeng Zhang and Xin Ai and Qiange Wang and Chaoyi Chen and Yu Gu and Ge Yu},
  title        = {Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective},
  journal      = {Proc. VLDB Endow.},
  volume       = {17},
  number       = {6},
  pages        = {1241--1254},
  year         = {2024},
  url          = {https://www.vldb.org/pvldb/vol17/p1241-yuan.pdf},
}

📬 Contact

For any questions or feedback, feel free to contract Hao Yuan or create an issue in this repository.