-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spindle could not connect to session #44
Comments
okay I have spindle tests started running, and I think I might not have enough resources because my tiny cluster hangs on: # ./runTests
Running: ./run_driver --dependency --push
srun: Requested partition configuration not available now
srun: job 3 queued and waiting for resources What does spindle require for resources given slurm testing? |
Going to try openmpi now |
When I try testing with openmpi: Spindle Error: Could not identify system job launcher in command line
Running: ./run_driver --dlopen --preload and then the same error about not being able to connect to a session. |
If you were using spindle with slurm 20.11+, then I just pushed a fix for running spindle with that version of slurm to devel. The issue could have produced the hang you were seeing. |
Quick test of a build and I'm seeing:
I'll listen to the message and try out those various options, probably not right now because I'm tired, but will update here with what I find. |
okay - so I gave a shot to rebuild and add the $ docker exec -it slurmdbd bash
[root@slurmdbd /]# squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3 normal spindle_ root PD 0:00 2 (PartitionNodeLimit) So can I ask again - how many concurrent nodes are required for spindle to run tests with slurm? |
What command do you usually use for pynamic? I can try that instead. |
Okay this looks to work for pynamic, although still no go to add spindle. $ time python config_pynamic.py 30 1250 -e -u 350 1250 -n 150
************************************************
summary of pynamic-sdb-pyMPI executable and 10 shared libraries
Size of aggregate total of shared libraries: 2.5MB
Size of aggregate texts of shared libraries: 6.8MB
Size of aggregate data of shared libraries: 408.4KB
Size of aggregate debug sections of shared libraries: 0B
Size of aggregate symbol tables of shared libraries: 0B
Size of aggregate string table size of shared libraries: 0B
************************************************
real 21m33.556s
user 14m54.538s
sys 3m31.206s |
What's happening here is that there's a bug/feature in Slurm 20.11+ that makes it so Spindle can't launch its daemons with Slurm. The "checking slurm version for compatibility... no" means you're hitting that. There's two autoconf-level options:
You'll probably have to use option 2 here. Or you could regress your slurm version. And I'd usually run pynamic based on the README.md commands in its repo. So something like: srun pyMPI pynamic_driver.py |
I'm getting errors in testing and attempted usage that Spindle cannot connect to some session. I'm installing as follows:
And I've tried that with both slurm and openmpi as the "testrm" And then I make the tests
cd testsuite make ./runTests
but no matter what I do (using the slurm or openmpi template, both of which I have) I see this error:
I saw this same error in trying to just use spindle so I've gone back to the tests to debug. Note that I do have a /tmp area:
Update: I think it could possibly be that they need to see the same /tmp area - so I'm rebuilding the containers with a shared /tmp area and will report back.
The text was updated successfully, but these errors were encountered: