Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy Bundle Edge Version #164

Open
moula opened this issue Mar 1, 2023 · 13 comments
Open

Deploy Bundle Edge Version #164

moula opened this issue Mar 1, 2023 · 13 comments

Comments

@moula
Copy link

moula commented Mar 1, 2023

Bonjour,
I tried to deploy the edge version of the bundle on my data-center. Everything installs except slurmrestd. Thank's.

Capture d’écran du 2023-03-01 14-56-12

Capture d’écran du 2023-03-01 14-56-51

@jaimesouza
Copy link
Contributor

Hello @moula! Thanks for contacting us! We are working to fix that.

@jaimesouza
Copy link
Contributor

@moula could you try again, please? You can also try the latest on the candidate channel (--channel candidate).

@jaimesouza
Copy link
Contributor

jaimesouza commented Mar 1, 2023

We have fixed the issue and released a new revision of the slurm charms to edge. Thank you for trying the slurm charms! Let us know if you need any help.

@moula
Copy link
Author

moula commented Mar 1, 2023

@jaimesouza I'm trying again, I'll get back to you as soon as it's done.
Capture d’écran du 2023-03-01 23-16-08

@moula
Copy link
Author

moula commented Mar 1, 2023

@jaimesouza it works, but during the deployment I had to reboot the machines manually which is not the same thing with version 8.5. I will add Nvidia GPUs and monitoring tomorrow in order to test it in use. Merci.

Capture d’écran du 2023-03-02 00-04-06
Capture d’écran du 2023-03-01 23-58-29

@jamesbeedy
Copy link
Contributor

Hello @moula! Thanks for stopping by!

We have another charm called, nvidia-gpu. We will be deprecating the built-in charm infiniband and gpu actions soon and replacing them with individual charms.

To add cuda drivers to your deployment, you could deploy and relate the nvidia-gpu charm to the nodes where your gpus are.

juju deploy nvidia-gpu --channel edge
juju relate nvidia-gpu slurmd

@moula
Copy link
Author

moula commented Mar 2, 2023

@jamesbeedy Thank you very mauch.
I am testing it. If you have other howtos or videos for example for integration with COS or testing code with GPU, I'm interested in testing all that. A note: is the migration of percona-cluster planned?
I know this has been done before : canonical/slurmdbd-operator#9
Capture d’écran du 2023-03-02 11-14-18

@jamesbeedy
Copy link
Contributor

jamesbeedy commented Mar 2, 2023

Hey @moula,

We will be collaborating with @NucciTheBoss and @dvdgomez from the HPC team at Canonical for the next amount of time to revise/refactor the Slurm charms. Any changes you see in the Canonical/slurm*-operator forks will eventually end up getting PR’d into the omnivector-solutions/slurm*-operator repos.

@NucciTheBoss
Copy link
Contributor

Hi there @moula! Yes, I have the migration completed and plan on submitting for upstream to the Omnivector folks soon. I am also tackling the versioning issue mentioned in charmed-hpc/slurmdbd-operator#5, which is where I think you received the original install error that Jamie and James fixed.

@moula
Copy link
Author

moula commented Mar 2, 2023

Bonjour @NucciTheBoss Yes You are all doing a good job. Thank you so much.

@moula
Copy link
Author

moula commented Mar 2, 2023

@jamesbeedy Thank you.
For the installation of GPU drivers, I got the message "Machine needs reboot". I rebooted 5 times each node
Capture d’écran du 2023-03-02 16-12-21
Capture d’écran du 2023-03-02 16-10-59
Capture d’écran du 2023-03-02 16-06-45
for it to work!!!.

@jamesbeedy
Copy link
Contributor

Hey @moula , It takes a few minutes following a reboot for that message to disappear. We can try and clean up the messaging around rebooting by making a shorter period in between polling to see if the machine needs a reboot. Most likely your first reboot worked 🙂

@moula
Copy link
Author

moula commented Mar 2, 2023

@jamesbeedy keep it up . Thank's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants