-
-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use spawner processes to speed up spawning #3719
Conversation
SCons' process (and all Python stuff you push into it) may become quite heavy in RAM. It's not cheap for RAM-heavy processes to do fork()s and exec()s. This PoC commit lets each worker of SCons have a private "spawner" process which has a tiny memory footprint. Instead of using subprocess.Popen directly, the exec_subprocess() will request the spawner process private to the current thread to do the spawning. This makes spawning much faster.
pickle.load reads 1 pickle message from the buffer object and returns, so theres' no need to precede the message with size.
SCons/Platform/posix.py
Outdated
if process_spawner: | ||
from SCons.Job import spawner_tls | ||
|
||
def exec_subprocess(l, env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified only exec_subprocess
, and not exec_popen3
/ _subproc
because they are more complex to wrap (although not impossible), and the "big win" for performance is from this function, which is the most commonly called (AFAIK)
SCons/Platform/__init__.py
Outdated
@@ -54,6 +54,10 @@ | |||
import SCons.Subst | |||
import SCons.Tool | |||
|
|||
# Gets set to True in Main._main() when the '--process-spawner' | |||
# option was given on the command-line for speeding up process spawning. | |||
process_spawner = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the right spot to place it? It's POSIX only so it should be in posix.py
maybe? I followed #3703 on this one.
Please fix sider issues. |
The concept is not going to change much. I am planning to add a few more features before I think it's ready to merge, though. I'll update the description with tasks. I'll complete them further on this week. |
About
Given these results I'll stick with the plain version. |
I benched the build of Godot:
This amounts to a tiny 0.5% improvement (user+sys time). I wasn't expecting to see much because Godot's SCons doesn't use too much RAM (starts at 60MB, climbs up to 150MB). Impact becomes more obvious as RAM usages grows. |
@@ -264,6 +268,37 @@ def run(self): | |||
|
|||
self.resultsQueue.put((task, ok)) | |||
|
|||
class Spawner: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move to posix.py
I'm not seeing any compelling results above. |
I don't have an open-source example with heavy memory consumption at hand (if you know of any project that might fit, please let me know and I'll try on it). I was thinking this week to take a more accurate measuring: the problem with a simple Obviously, if that metric shows no notable improvement, then it's not worth it. Though, like I said, for projects with heavy memory consumption, this is proved useful (I can share some results of Godot build where I forced SCons to take, say, 500MB. Difference becomes notable) |
Did you try |
Nope, cool! Will try that, thanks for the tip. |
Okay, I tried with So I tried the following - I added this small snippet:
to SCons' Trying again with Godot. On this branch without On this branch with
that account to I admit this doesn't look notable comparing to the total build time for Godot (which is about 2k CPU seconds) but it is a notable portion of the SCons time (about 15%), as and as SCons' time grows, this part grows as well. With SCons at 700MB~ ,the |
@Jongy - The MongoDB build might be another one to try if you want some additional data points. We too would like to see SCons be faster. |
@acmorrow I tested build of Build commandline I used: Without With |
@bdbaddog What do you think about these results? There is no huge effect on the build time, but that's as far as it could go, since SCons itself is only a small portion of that time. But the improvement in SCons-only time is definitely notable, and like I said this is aimed at projects consuming more RAM, where this change is much more effective. Also you can see in the this comment - when SCons is under stress, this change has positive effects. Waiting for your approval (or disapproval :) before I go on with this. |
Curious how much RAM are you seeing SCons take. |
IIRC it was about 1.5G, but I don't have access to the code at the moment, so can't verify. It's indeed very likely the build doesn't really need those 1.5G - like every other program, it can be optimized. Nonetheless, this idea proposed here had notable effects on the build time, at practically zero cost. |
Just FYI as of Python 3.10 Linux Python's subprocessing module uses vfork() instead of fork() unless the caller specifically disables it. So this pull request may not be needed anymore. |
@grossag - Thanks for they update and keeping an eye on developments related to this PR. :) |
Yup, as noted in the description - tracking ticket was https://bugs.python.org/issue35823 and it got merged in 3.10. This PR is far less relevant now |
@Jongy - so safe to say the recommended fix now is to use python 3.10+? If so, let's close this PR. |
Yes. It will cover the SCons case (that is, the I'll close the PR, I see no need to introduce the extra complexity, given that people can just upgrade to 3.10 |
I'm generally fine with this resolution, too, but... it's not true that everyone can just upgrade to 3.10+. We keep hearing from people who can't "because" (usually an IT/Security thing restricting from using a Python not provided by the Linux distribution in question). We also need to get off using os.spawnve on Windows, but that's got its own trackers... |
@Jongy - One question for you. On the system(s) where the spawner was making a notable difference, how much RAM did the system(s) have? |
16GB. However the physical RAM size doesn't matter, what matters is the amount of pages in the pagetables in the app - which is the RSS of the app divided by page size. Using hugepages, for examples, will very likely reduce the slowdown drastically. RSS I described here. |
SCons tends to grow large in RAM usage. It's expensive for large processes to
fork()
.The best solution would be
vfork()
, but CPython doesn't support it properly (see this issue for latest status).Until CPython has it (and in order to support older versions as well), I added the
--process-spawner
SCons options, which lets eachWorker
create its private "spawner" process. Furtherexec_subprocess()
calls by thatWorker
will send thePopen
requests to the spawner via simple IPC (pickle messages over stdin/stdout), and the spawner willPopen
the requested process,wait
for its result and send it back to theWorker
.Overall it's quite simple, yet for RAM-heavy SCons it adds a huge speedup.
Discussed (with some performance benching) in https://pairlist4.pair.net/pipermail/scons-users/2020-June/008286.html and #3703 (specifically this comment, this one and this)
Tasks:
mutltiprocessing
module--process-wrapper
)Contributor Checklist:
I still have to finish these (I just want an ack from one of you the maintainers that this looks okay)
CHANGES.txt
(and read theREADME.rst
)