Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: increase outbound q size for pubsub #1217

Merged
merged 2 commits into from
Sep 10, 2024
Merged

Conversation

chaitanyaprem
Copy link
Collaborator

@chaitanyaprem chaitanyaprem commented Sep 5, 2024

Description

while going through logs in status-desktop relay mode, i had noticed following log occuring quite consistently.

DEBUG[09-05|06:35:08.182|go.uber.org/zap/sugar.go:198] dropping message to peer 16Uiu2HAmGAA54bBTE78MYidSy3P7Q9yAWFNTAEReJYD69VRvtL5r: queue full

On quick analysis, noticed that this is happening towards almost all peers.

In a span of ~24 hours i could see similar logs towards almost 200 peers which shows that we need to increase queue-size that might be causing this.


grep "dropping message to peer" geth* | awk '{print $6}' | sort | uniq  | wc -l
     202

This may help improve message reliability as messages would not be dropped towards peers be it control msgs at gossipsub layer or application messages.

Changes

Tests

dogfooding in progress, will update with results

@chaitanyaprem
Copy link
Collaborator Author

chaitanyaprem commented Sep 5, 2024

still noticing these logs for some peers after applying the fix. In last 2.5 hours, noticed these logs for 5 peers

# grep "dropping message to peer" geth* | awk '{print $6}' | sort | uniq  | wc -l
       5

# grep "dropping message to peer" geth* | awk '{print $6}' | sort  | wc -l       
    4759

# grep "dropping message to peer" geth* | awk '{print $6}' | sort | uniq         
16Uiu2HAm2M7xs7cLPc3jamawkEqbr7cUJX11uvY7LxQ6WFUdUKUT:
16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P:
16Uiu2HAm9aDJPkhGxc2SFcEACTFdZ91Q5TJjp76qZEhq9iF59x7R:
16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfT:
16Uiu2HAmNTpGnyZ8W1BK2sXEmgSCNWiyDKgRU3NBR2DXST2HzxRU:

Interestingly so far, i have noticed these logs only towards store nodes and 1 bootnode in the fleet in past 2 hours. will keep monitoring for longer duration to get better results and idea of what else could be happening.
Update after few more hours: now i see queue full towards other peers as well. total peers noticed 18, which include upto 9 fleet peers.

@chaitanyaprem
Copy link
Collaborator Author

Update after few days of running....noticed a significant reduction in the occurance of this issue now after increaseing the outgoing queue size.

I could only see this happen towards 14 peers which is less than more than 100 peers that was before. But it is still happening which needs further investigaton which will be tracked via separate issue.

So, i think we are good to go with this PR.
@richard-ramos need your eyes on this one.

Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chaitanyaprem chaitanyaprem merged commit bf2b7dc into master Sep 10, 2024
12 checks passed
@chaitanyaprem chaitanyaprem deleted the feat/pubsub-peer-qsize branch September 10, 2024 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants