You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Jake McArthur (Jira)" <ji...@apache.org> on 2020/07/21 16:21:00 UTC

[jira] [Created] (ZOOKEEPER-3894) Out-of-order response after session moved

Jake McArthur created ZOOKEEPER-3894:
----------------------------------------

             Summary: Out-of-order response after session moved
                 Key: ZOOKEEPER-3894
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3894
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
            Reporter: Jake McArthur


A bug in NIOServerCnxn can result in a client failing with an error about out of order xids. What actually happens, as I understand it, is:
 # Client attempts to renew its session on slow server S1.
 # The attempt times out.
 # Client attempts to renew its session on server S2.
 # The attempt succeeds. S2 now owns the session.
 # The client sends one or more requests. The responses are large enough that they fill the socket's buffer in S2.
 # The original attempt finally succeeds. S1 now owns the session, but the client is still connected to S2.
 # The client sends an asynchronous request A to S2. Because the session has moved, S2 instructs the NIOServerCnxn to close. This is implemented as an empty sentinel value added to the queue of outgoing buffers.
 # The client sends some read request B to S2, and the response is enqueued behind the sentinel.
 # The doIO method of NIOServerCnxn writes its enqueued buffers to the socket, and then it closes the socket because one of the buffers was the sentinel.
 # Before the client observes that the socket it closed, it receives the response for B, and fails with an error because it expected the response for A.

I think the fix is simply to avoid writing messages that were enqueued after the sentinel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)