Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selector completes receives on a connection before completing sends if there are errors #376

Open
pnarayanan opened this issue Jun 27, 2016 · 4 comments

Comments

@pnarayanan
Copy link
Contributor

Just noticed during some testing that the ambry selector can complete receive on a connection id when the send is not complete. A quick look at the code indicates that the interest ops are only set to READ when the send is completed, so not sure how this happens. This seems like a bug and needs to be investigated and fixed.

This happened when there were timeout errors and such happening, but interestingly, did not notice any error on the said connection (if not it would not have been able to complete the receive, I think)

Will try to add more details.

2016/06/27 17:08:14.590 INFO [NetworkClient] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Connection checkout succeeded for lva
1-app2039.stg.linkedin.com:Port[15088:PLAINTEXT] with connectionId 0.0.0.0:-1-10.136.152.27:15088_9
2016/06/27 17:08:14.710 INFO [NetworkClient] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Receive completed for connectionId 0.0.0.0:-1-10.136.152.27:15088_9 and checking in the connection back to connection tracker

...

2016/06/27 17:08:16.591 INFO [NetworkClient] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Connection checkout succeeded for lva1-app2039.stg.linkedin.com:Port[15088:PLAINTEXT] with connectionId 0.0.0.0:-1-10.136.152.27:15088_9
2016/06/27 17:08:16.591 INFO [Selector] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] ***Setting NetworkSend threw an exception for connection id: 0.0.0.0:-1-10.136.152.27:15088_9
2016/06/27 17:08:16.592 ERROR [NonBlockingRouter] [RequestResponseHandlerThread-1] [ambry-frontend-nb] [] Aborting, as requestResponseHandlerThread received an unexpected error:
java.lang.IllegalStateException: Attempt to begin a networkSend operation with prior networkSend operation still in progress.
        at com.github.ambry.network.Transmission.setNetworkSend(Transmission.java:65)
@xiahome
Copy link
Contributor

xiahome commented Jun 27, 2016

Still not sure if I completely catch what you are saying.

the interest ops are only set to READ when the send is completed

For the same channel, once a send is completed, shouldn't we only expect response to arrive (hence reading from the channel)?

@pnarayanan
Copy link
Contributor Author

My point with that statement was that interest ops are set to READ only after the send is completed, so there should not be any read happening on a channel while the send is ongoing.

Anyway, I looked into the code further and it seems that I was wrong about the premise that interest ops are set to READ only after the send completed. I see that what we do after a connection is established is this:
key.interestOps(key.interestOps() & ~SelectionKey.OP_CONNECT | SelectionKey.OP_READ);

I am not sure if finishConnect() gets called from the server side, but it seems to me that from the client side, we should not be explicitly setting OP_READ, and that this should be:
key.interestOps(key.interestOps() & ~SelectionKey.OP_CONNECT);

More importantly, when sending, the Selector does this in setNetworkSend:
key.interestOps(key.interestOps() | SelectionKey.OP_WRITE);

Now, since the existing interest ops already has OP_READ (because that was done in finishConnect()), the key's interest ops has both OP_READ and OP_WRITE after setNetworkSend() is called. Ideally you don't expect to receive anything while sending, but perhaps the client should not set itself up for that.

Perhaps setNetworkSend() should do this:
key.interestOps(key.interestOps() & ~SelectionKey.OP_READ | SelectionKey.OP_WRITE);

I tried setting things this way, but the unit tests started failing and I didn't get to dig deeper. It could be that the tests are expecting something unreasonable. This needs to be investigated.

@vgkholla
Copy link
Contributor

vgkholla commented Feb 9, 2017

@pnarayanan @nsivabalan Is this still relevant?

@vgkholla
Copy link
Contributor

vgkholla commented Jun 8, 2017

@pnarayanan @nsivabalan ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants