Orchestra distribution of long running jobs on AWS infrastructure
- Run long running process on EC2
- run command on server and recieve output locally
- Set into production: 1. run periodically 2. run permanently (server)
First create a rxtb-config.yaml file
Setup the following sections: provider, project, instance
to start an instance:
rxtb start
Specified instances:
name provider
-- --------- ----------
0 gpu-iptc aws
1 test-iptc aws
Choose instance to start: 1
Then select
[x] Create security group [x] Create key-pair [x] add key to rxtb-profile [x] add user name prefix to key name to avoid duplicates on multi person work setup [x] Test tmux command for long running commands [x] Run command remotely inside docker container [ ] Test if --gpus=all works on gpu instance [ ] Somehow save output [ ] Test if tensorboard can be run from commands
[ ] Start/stop instance for improved startup times [ ] Start normal instance (non-spot)
[ ] parse project like provider, instance and container [ ] if no rxtb file in current dir traverse until .git dir