-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept new requests on SVE builders only if idle #81
base: main
Are you sure you want to change the base?
Conversation
Use nextBuild on SVE builders to accept new build requests only if the corresponding builder is not currently building anything. This is needed on SVE bots because any of the 4 available workers can accept requests from any of the 4 SVE builders, and sometimes some builders use multiple workers while others end up with none available. This is aggravated by the fact that the same builders may use more than one workers for several times in a row, causing other builders to starve (clang-aarch64-sve-vls, for instance, did not build anything in the last 3 days). The buildbot documentation (2.5.2.8. Prioritizing Builders) says that builds are started in the builder with the oldest pending request, but it seems this is not working, possibly because the collapse requests feature ends up resetting submittedAt time.
f55eef0
to
e53229f
Compare
buildbot/osuosl/master/master.cfg
Outdated
if b['name'].startswith("clang-aarch64-sve-"): | ||
nextBuild = nextSVEBuild | ||
builders.append(util.BuilderConfig(**b, nextBuild=nextBuild)) | ||
c['builders'] = builders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, I know it's a pain to setup a whole local infra for these things.
I get the logic here but I think we'll want it to be more general:
- Add an attribute to our builder class
max_simultaneous_builds
orsingle_build_only
if we're being binary about it. - Somehow get to that attribute here and check for it instead of name matching.
Also I was thinking this could be done to accept a number of builds, but I don't know if we have that information in the nextbuild callback. For Linaro's specific purpose right now, checking for 0 or 1 is fine, but as a general feature, "start a new build if active builds is < N" is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. The last commit adds max_simultaneous_builds
to linaro-g3-* builders and use it to set nextBuild
.
It seemed better to me to avoid changing BuilderConfig class, that is in another repository, so I added max_simultaneous_builds
to the dictionaries that are used to create BuilderConfig instances.
Also the context that may be missing here (and worth adding to the PR message) is as I understand it, there is a way to limit how many builds a single worker can run, but not how many builds a builder can be running on a pool of workers.
Here we are already able to make sure that A and B are not building on the worker at the same time.
This PR aims to enable us to prevent Builder A from building on worker 1 and worker 2 at the same time. Starving B of resources. (if the diagrams make any sense, feel free to steal them :) ) |
Limit the number of simultaneous builds using this new attribute, instead of name matching.
@@ -442,6 +442,7 @@ | |||
'tags' : ["clang"], | |||
'workernames' : ["linaro-g3-01", "linaro-g3-02", "linaro-g3-03", "linaro-g3-04"], | |||
'builddir': "clang-aarch64-sve-vla", | |||
'max_simultaneous_builds' : 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are any documentation comments somewhere that we can add an explanation of this to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly to mention that the "default" is unlimited jobs per builder.
Also I thought I had sent you on a wild goose chase, when I remembered there is the collapse requests feature. Then I remembered that we're already using that (it defaults to True) on the SVE builders. So this is not a possible solution. |
This looks good for Linaro's intent, but let's see if @gkistanova can tell us if we're reinventing the wheel here. |
Are you really want/need to limit a number of builds of particular configuration running on the pool? I mean, if there is an empty build queue for other builders, there is nothing wrong in using the whole pool of workers for a single build configuration, right? In this case there is no reason to keep some of the workers in the pool idle and response tome for that build configuration longer. Maybe instead of limiting, we shall reconsider how the build jobs get scheduled on workers to make sure we actually distribute available workers fairly? Let me think about this. |
It's not really needed, but it's an easy way to avoid a particular configuration from using too many workers at the same time.
Right.
This would indeed be the best solution. I just don't know how to implement it :) |
Buildbot scheduler has been fixed. Now it should handle this case properly. |
I am afraid Linaro's SVE builders are still not getting a fair schedule of the available workers. |
Some of Linaro's SVE builders are suffering from starvation,
because there is currently no way to limit how many builds a given
builder can start simultaneously. As there are 4 SVE builders that
may use any of the 4 G3 workers, sometimes some builders use
multiple workers while others end up with none available. This is
aggravated by the fact that the same builders may use more than one
worker for several times in a row, causing other builders to starve
(clang-aarch64-sve-vls, for instance, has been idle for over 3
days once).
The buildbot documentation (2.5.2.8. Prioritizing Builders) says
that builds are started in the builder with the oldest pending
request, but it seems this is not working, possibly because the
collapse requests feature ends up resetting submittedAt time.
While there is a way to limit how many builds a single worker can
run, there is currently no way to limit how many builds a builder
can be running on a pool of workers.
As shown below, this PR aims to enable us to prevent Builder A from
building on Worker 1 and Worker 2 at the same time. Starving B of
resources.
This is accomplished by introducing the max_simultaneous_builds
setting for builders. Its value specifies the maximum number of
builds that a builder can have simultaneously. This limit is
enforced using nextBuild, from BuilderConfig. With it, it was
possible to limit Linaro's SVE builders to only one build at a time.