-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise 404 responses #362
Comments
I tried finding codepaths that could explain, I couldn't. Most are 404 on identifiers, those do not go through the The scenario that looks the most likely currently: querying Git or the databases on non-existing values gives a worst-scenario timing, maybe with some sort of timeout. Also tried reproducing locally by downloading Linux databases from prod and running Elixir in a container, and I couldn't reproduce. Also, we only log wallclock duration. It could be that those threads hang here doing nothing. In that case, then avoiding those response times would bring close to no difference on the server load (which we want to reduce). Also, it could be that user agents are the ones that are slow. That would explain the weird distribution. TODO: how to debunk that hypothesis? Can Apache log other things that response wallclock time? |
This was all a wrong lead. Oops. I don't remember why I was mistaken though. |
For some reason, almost all our response times above two seconds is for answering 404s. I have investigated quite a bit and haven't yet found an explanation.
Some stats from two weeks worth of data:
Clearly something is wrong. Pretty graphs agree (log scale, note the 2s spike for 404s which isn't present for 200s):
I say "optimise 404s" but I think the issue is for requests that aren't 200. The difference is subtle because almost all requests that are not 200 are 404s.
The text was updated successfully, but these errors were encountered: