Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding selecting hyperparameters during training #357

Open
SoumyaCYZ opened this issue Feb 4, 2025 · 0 comments
Open

Regarding selecting hyperparameters during training #357

SoumyaCYZ opened this issue Feb 4, 2025 · 0 comments

Comments

@SoumyaCYZ
Copy link

Señor @RaulPPelaez @AdriaPerezCulubret @peastman @stefdoerr @guillemsimeon I'm using the following parameters during training of alanine-dipeptide. But after training, I'm getting some crazy ckpt files. Are these hyperparameters are fine?
rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:00 epoch=1-val_loss=28354817745406260133494784.0000-test_loss=0.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:01 epoch=3-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:03 epoch=5-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:04 epoch=7-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:05 epoch=9-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:07 epoch=11-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:08 epoch=13-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:10 epoch=15-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:11 epoch=17-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt
-rw------- 1 cyz218385 cyz21 3.5M Feb 4 20:13 epoch=19-val_loss=28354817745406260133494784.0000-test_loss=4108202541056.0000.ckpt

TRAIN.YAML
batch_size: 256
#inference_batchsize: 256
dataset: Custom
coord_files: "31jan_coords.npy"
embed_files: "31jan_ca_embeddings.npy"
force_files: "31jan_ca_deltaforces.npy"
cutoff_upper: 12.0
cutoff_lower: 3.0
log_dir: /home/chemistry/phd/cyz218385/scratch/aladi_wat/aladi_300k
derivative: true
#distributed_backend: ddp
early_stopping_patience: 30
embedding_dimension: 128

#label:
#- forces
lr: 0.0005
lr_factor: 0.8
lr_min: 1.0e-06
lr_patience: 10
lr_warmup_steps: 0
model: graph-network
neighbor_embedding: false
ngpus: -1
num_epochs: 100
num_layers: 4
num_nodes: 1
num_rbf: 18
num_workers: 8
rbf_type: expnorm
save_interval: 2
seed: 1
test_interval: 2
test_size: 80
trainable_rbf: true
val_size: 20
weight_decay: 0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant