You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Jake McArthur (Jira)" <ji...@apache.org> on 2020/07/21 16:21:00 UTC
[jira] [Created] (ZOOKEEPER-3894) Out-of-order response after
session moved
Jake McArthur created ZOOKEEPER-3894:
----------------------------------------
Summary: Out-of-order response after session moved
Key: ZOOKEEPER-3894
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3894
Project: ZooKeeper
Issue Type: Bug
Components: server
Reporter: Jake McArthur
A bug in NIOServerCnxn can result in a client failing with an error about out of order xids. What actually happens, as I understand it, is:
# Client attempts to renew its session on slow server S1.
# The attempt times out.
# Client attempts to renew its session on server S2.
# The attempt succeeds. S2 now owns the session.
# The client sends one or more requests. The responses are large enough that they fill the socket's buffer in S2.
# The original attempt finally succeeds. S1 now owns the session, but the client is still connected to S2.
# The client sends an asynchronous request A to S2. Because the session has moved, S2 instructs the NIOServerCnxn to close. This is implemented as an empty sentinel value added to the queue of outgoing buffers.
# The client sends some read request B to S2, and the response is enqueued behind the sentinel.
# The doIO method of NIOServerCnxn writes its enqueued buffers to the socket, and then it closes the socket because one of the buffers was the sentinel.
# Before the client observes that the socket it closed, it receives the response for B, and fails with an error because it expected the response for A.
I think the fix is simply to avoid writing messages that were enqueued after the sentinel.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)