Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent HTTP 502 Bad Gateway errors from gem push for sorbet gem #2074

Open
jez opened this issue Jul 25, 2019 · 8 comments
Open

Frequent HTTP 502 Bad Gateway errors from gem push for sorbet gem #2074

jez opened this issue Jul 25, 2019 · 8 comments
Labels

Comments

@jez
Copy link

jez commented Jul 25, 2019

👋 Hello @hsbt I was chatting with @mame on Slack, thank you for providing such fast responses! Nice to meet you.

Here's some context for the behavior we're seeing:

  • At https://github.com/sorbet/sorbet, we push to master between 5 and 15 times per day.

  • On each master commit, we use gem build to create a new version of the sorbet gem.

  • We actually push 8 *.gem files for each commit:

    • sorbet-static, which is platform dependent (because it contains a compiled C++ binary)
      • sorbet-static-$VERSION-x86_64-linux.gem
      • sorbet-static-$VERSION-universal-darwin-14.gem
      • sorbet-static-$VERSION-universal-darwin-15.gem
      • sorbet-static-$VERSION-universal-darwin-16.gem
      • sorbet-static-$VERSION-universal-darwin-17.gem
      • sorbet-static-$VERSION-universal-darwin-18.gem
      • sorbet-static-$VERSION-universal-darwin-19.gem
    • sorbet-runtime
      • sorbet-runtime-$VERSION.gem
    • sorbet
      • sorbet-$VERSION.gem
  • In the last 30 commits to master, we've had the step of our CI that runs gem push to release these new versions respond with an HTTP 502 Bad Gateway error 10 times.

    Basically, we have a 1 out of 3 chance that our gem publishing step will fail.

We'd like to be able to release a version for every master commit, because every time we commit we're either releasing a new feature or fixing a bug that our users would like to be able to start using right away.

Is there some way to investigate the root cause of the failure? I'm happy to help provide any steps to reproduce the error that I can, but unfortunately it's unpredictable.

Here are some build logs from our CI that show you what the failure looks like from Sorbet's perspective:

https://buildkite.com/sorbet/sorbet/builds/4650#b11fc15f-5894-4b40-9c57-75f638f20657/270-285

https://buildkite.com/sorbet/sorbet/builds/4646#b1028573-0073-4625-a579-c7becc1eaa0d/270-275

Thanks again for your time already!

/cc @DarkDimius @elliottt @aisamanra @azdavis

@dwradcliffe
Copy link
Member

Unfortunately this isn't a surprise to me; I've been aware that we're serving more 502s than I'd like.

Without actually looking, I don't think it's anything specific to your gems, but more likely the rapid succession of pushes.

This is something we need to look at and we just haven't had time to fully solve the problem yet. :(

Ideas for temporary fixes (all terrible): add retries and/or sleep in between the pushes.

@dwradcliffe
Copy link
Member

Similar to #1678, but I'm not sure its the same root cause.

@jez
Copy link
Author

jez commented Jul 25, 2019

@dwradcliffe thanks for the quick response! I was hesitant to add retry logic in case that would make the problem worse. Do you have suggestions for how best to retry? I'm happy to pick an arbitrary retry strategy but if you'd like something specific I'm happy to implement it.

@rubyFeedback
Copy link

I think this may not be solely limited to the sorbet gem but might be of some ... hmm... how to call it ... internal behaviour or limitations of the ecosystem managing gems in general?

I sometimes have such gateway errors too, most often when I visit a gem-website; only very rarely when I push something. But it seems to be semi-random, at the least to me; hard to say when this problem arises. But I think it is not solely confined to sorbet alone.

To me personally it is not a problem, I push at a later time and it works just fine. For automatic setups perhaps this is a bit more annoying in that they may have to automate re-pushing of gems if they fail, but this can also be managed downstream. The ideal thing would be of course that bad gateway errors could be reduced or removed altogether. I have no good solution as to how though.

dwradcliffe:

Without actually looking, I don't think it's anything specific to your gems, but more likely the
rapid succession of pushes.

Yes I think so too.

I remember when I was cleaning up lots of my old gems, e. g. removing old versions, I had numerous gateway problems and the like. Then a rate limitation was added. Took me a bit to adjust to it (locally I added a delay to my gem yanks and very rarely run into this error rate limit anymore). There is probably some cause for the gateway problems though - perhaps someone can find out one day. :)

Do you have suggestions for how best to retry? I'm happy to pick an arbitrary retry strategy
but if you'd like something specific I'm happy to implement it.

For my gem yank operations, I simply do a lazy sleep() - not very sophisticated. :P

You can probably depend on local logic, e. g. use open-uri, read the homepage of the gem at hand, check for the most recent version via a regex, and then consider re-pushing. Not very pretty perhaps but I think this should work for the most part (I am not sure if I understood the whole problem domain here though; I don't have such large gem pushes, my changes are often very small per gem push and most of the time done manually).

@greysteil
Copy link
Contributor

We see the 502s on Dependabot Core, too (which is a collection of gems that get released about once a day - not every commit but it's a lot).

We have retry logic already baked in, and it's public: https://github.com/dependabot/dependabot-core/blob/master/Rakefile#L73-L99.

Hope that helps short-term and doesn't make anything worse on the Rubygems side 😬

@dwradcliffe
Copy link
Member

A quick update here so that you know we are not ignoring this. I rolled out some changes on August 12th that reduced our error rate to about 30% of what it was before. We still have work to do but it is getting better and hopefully that trickles down to your uses too.

@simi
Copy link
Member

simi commented Oct 31, 2023

@jez are you still facing problems during gem push?

@jez
Copy link
Author

jez commented Nov 1, 2023

We have to wrap all of our gem pushes in an retry loop with backoff:

https://buildkite.com/sorbet/sorbet/builds/29287#018b8428-7821-408b-9acd-f343cadecbf2/117-127

You can see in our logs that we still get publish failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants