You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On present master, we observed that the etcd networking seems to fail and is only resolved by restarting the node (or perhaps even the entire computer).
One potential idea is that it is related to some leak somewhere between the hydra-node and etcd itself (i.e. our logs, or the grpc interface).
Some trivia
Leaking file descriptors?
On one computer we saw several open things with lsof:
> lsof | grep 'etcd' | wc -l
306
Most of which were TCP-style connections.
Buffer full?
There was this error in the etcd logs:
"message-type
":"MsgHeartbeat","msg":"dropped internal Raft message since sending buffer is full (overloaded network)"
Perhaps it's related?
hydra-node not stopping when signalled with systemctl stop hydra-node
It spun for a while not stopping, so I restarted my computer instead, and it came back fine.
The text was updated successfully, but these errors were encountered:
noonio
changed the title
Etcd networking gets into invalid state after downtime of one peer
Etcd networking gets into invalid state after downtime of one peer/after some time
Mar 10, 2025
A few first changes to help debugging connectivity issues we saw in
course of #1879.
Note that changing the `msg` key is not a (major) breaking change as
watching is done using the `msg` prefix and the port parsing in
`matchVersion` is defensively done. The version check is bound to change
anyways now (not do it on each message!)
---
* [x] CHANGELOG update not needed
* [x] Documentation update not needed
* [x] Haddocks updated
* [x] No new TODOs introduced
Context & versions
On present master, we observed that the etcd networking seems to fail and is only resolved by restarting the node (or perhaps even the entire computer).
One potential idea is that it is related to some leak somewhere between the hydra-node and etcd itself (i.e. our logs, or the grpc interface).
Some trivia
Leaking file descriptors?
On one computer we saw several open things with
lsof
:Most of which were TCP-style connections.
Buffer full?
There was this error in the etcd logs:
Perhaps it's related?
hydra-node not stopping when signalled with
systemctl stop hydra-node
It spun for a while not stopping, so I restarted my computer instead, and it came back fine.
The text was updated successfully, but these errors were encountered: