We use The Material Project as the source of the dataset generation.
- Create an account: https://profile.materialsproject.org/
- Visit and generate your API_KEY: https://next-gen.materialsproject.org/api#api-key
- Save your API_KEY as "API_KEY.txt" in this directory
- The following commands assume you are running in
/workspace
in the Docker environment. - The working directory is
/workspace/data
and finally/workspace/data/dataset/*
will be created.
- Download The Material Project data.
poetry run python src/dataset_generation/01_download.py
.
- Extract structures from the crawled data.
poetry run python src/dataset_generation/02_extract_structures.py
.
- Convert to a conventional cell.
-
poetry run python src/dataset_generation/03_convert_to_conventional_cell.py
. - Select materials according to each criterion.
poetry run python src/dataset_generation/04_01_select_materials_by_cell.py
.- For the lim_l6 dataset.
poetry run python src/dataset_generation/04_02_select_materials_by_formula.py
.- For the ICSG3D dataset.
- Run python src/dataset_generation/04_03_select_materials_by_list.py.
- For the YBCO13 dataset.
- Split the dataset into train/validation/test splits.
poetry run python src/dataset_generation/05_split.py
.
- Generate the NeSF style dataset.
poetry run python src/dataset_generation/06_generate_NeSF_dataset.py
.