-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway silently dies #2380
Comments
Do you have any other logs? The log in the screenshot is not very useful to figure out the problem here. |
I see you're using 0.15. The gateway has seen a lot of bugfixes and changes in 0.16, so please try the release candidate and see if that changes anything for you. |
I'll give 0.16 a shot. Here are the tracing logs, for what its worth. |
So I did just experience this again, gateway just silently stops receiving messages, implementation here It did coincide with an uptime healthchecks on other services timing out, so I suspect a possible blip in either the ISP connection, or something with the machine the services are running on.
Have increased log levels for twilight_gateway and my own code to see if that adds more info, have also added more trace logging in my own code to exclude the potential that something there is holding everything up |
It sounds to me like the WebSocket closing procedure may have gotten stuck (as the healtchecks timed out). I fear you may have to look at the underlying WebSocket/TCP traffic instead because Twilight should detect and log missed heartbeats unless the shard is disconnected. When disconnected, the WebSocket connection may still be active (but in the process of closing), and the shard will only reconnect (open a new WebSocket connection) when the previous one is fully terminated. See https://github.com/twilight-rs/twilight/blob/main/twilight-gateway/src/shard.rs#L923-L931 and twilight/twilight-gateway/src/shard.rs Line 823 in 5e4bab5
|
Also what TLS backend are you using (rustls/Windows/openSSL) and have you noticed increased CPU usage when the shard gets stuck? |
As far as TLS backend goes (running in Docker scratch containers, so): twilight-gateway = { version = "0.16.0", features = ["rustls-ring", "rustls-webpki-roots" ], default-features = false } @Erk- also suggested Redis potentially causing the event loop to stall, so that's another avenue that my current trace logging should reveal. When it does stall again and it's not something else causing the loop to stall, would |
Btw the reason I asked about CPU usage is because you might have run into an infinite loop due to a workaround for discord/discord-api-docs#6011, which I'm now somewhat suspect of being related to this issue.
Maybe, although I would imagine the Redis connection to generate some kind of error instead of just stalling.
After quickly Googling around I'm not sure... Capturing the network traffic (e.g. Wireshark) should provide a clearer picture (as long as you capture the TLS keys, which I found a guide for here https://users.rust-lang.org/t/support-for-sslkeylogfile-in-https-client-from-reqwest/47616). I can help analyze captured data if you want. |
Oh right, setting up Wireshark capturing would require you to patch Twilight... Still I think that's the easiest way to troubleshoot this. |
Ok so update after adding extra trace statements that turned out to not be necessary for now, it seems we just, freeze up after log lines before it
|
Hello there! I'm experencing an issue with my bot where it will randomly die and no longer print heartbeat ack logs.
This is how my gateway is defined: https://github.com/Fyko/run.sh/blob/main/src/main.rs#L85
Is this a problem of my own doing?
The text was updated successfully, but these errors were encountered: