Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simultaneously ping may cause server hangs #2805

Open
ryojikamei opened this issue Jul 30, 2024 · 6 comments
Open

Simultaneously ping may cause server hangs #2805

ryojikamei opened this issue Jul 30, 2024 · 6 comments

Comments

@ryojikamei
Copy link

Problem description

When two nodes send messages to each other almost simultaneously, one of two server may hang.

Reproduction steps

  1. Git clone the example code: https://github.com/ryojikamei/repro1
  2. cd repro1
  3. node dist/run_2nodes.js
    Run it two or three times and we will have problems.

Environment

  • OS name, version and architecture: Linux Ubuntu 22.04.1 amd64
  • Node version: 18.20.3
  • Node installation method: n
  • Package name and version: @grpc/grpc-js 1.11.1

Additional context

This example uses a duplex stream, but I remember that the same problem can occur when written in unary.
But, I am unable to prepare a reproducible code in unary. That is somewhat difficult to reproduce.

@murgatroid99
Copy link
Member

I ran your code several times, and the only errors I see are ECONNREFUSED errors when one client tries to connect before the other server starts. Then the client that failed never recovers, but that's just because you're not creating a new call when you try again. Specifically in this line, you only create a new call if one doesn't already exist, but you don't delete the existing one when it fails.

@ryojikamei
Copy link
Author

run.log

@ryojikamei
Copy link
Author

ryojikamei commented Jul 31, 2024

Thank you for your response.

you don't delete the existing one when it fails.

I was under the impression that once a channel failed to communicate, it would be automatically recovered, but I was wrong, and that I would have to manually recreate the channel, is that correct?

According to the above log, server-7022 has received a second ping. However, it does not attempt to return the pong. I have no idea why it behaves this way, but anyway, I will try to re-write code to always recreate the channel when it fails.

@murgatroid99
Copy link
Member

I did not say that you should recreate the channel. The call is a separate object from the channel. The channel (or more accurately the client that owns a channel) is the object you create here and the call is the object you create here. You should persist the client object for as long as possible, and you need to create a new call every time there is an error. A call represents a single request, and an error indicates that the request is finished.

@ryojikamei
Copy link
Author

20240801.txt

Thank you very much. Now it works. I was not aware of the difference between call and channel correctly.
I have been struggling with this issue for two months and as a result, I could not see my mistake. My apologies. Please close this issue.

P.S. I have googled documentation on the difference between a call and a channel, but so far have not found a single hit. I assume that the official reference document is probably the only source of information. At least in my native language, I found zero information.

@murgatroid99
Copy link
Member

I suggest starting with the gRPC core concepts document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants