You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/25 22:06:00 UTC

[jira] [Commented] (ZOOKEEPER-2775) ZK Client not able to connect with Xid out of order error

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340172#comment-16340172 ] 

ASF GitHub Bot commented on ZOOKEEPER-2775:
-------------------------------------------

GitHub user jiajunwang opened a pull request:

    https://github.com/apache/helix/pull/131

    Bump up ZOOKEEPER version to 3.4.11.

    There is a zk connection related bug (ZOOKEEPER-2775) fixed in 3.4.11. Bump up version to get the fix.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiajunwang/helix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #131
    
----
commit 22cc4a4d4819b211b30094de7b3b0944cb5c033b
Author: jiajunwang <er...@...>
Date:   2018-01-25T22:04:13Z

    Bump up ZOOKEEPER version to 3.4.11.
    
    There is a zk connection related bug (ZOOKEEPER-2775) fixed in 3.4.11. Bump up version to get the fix.

----


> ZK Client not able to connect with Xid out of order error 
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2775
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2775
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.4.10, 3.5.3, 3.6.0
>            Reporter: Bhupendra Kumar Jain
>            Assignee: Mohammad Arshad
>            Priority: Critical
>             Fix For: 3.4.11, 3.5.4, 3.6.0
>
>         Attachments: ZOOKEEPER-2775-01.patch
>
>
> During Network unreachable scenario in one of the cluster, we observed Xid out of order and Nothing in the queue error continously. And ZK client it finally not able to connect successully to ZK server. 
> *Logs:*
> unexpected error, closing socket connection and attempting reconnect | org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1447) 
> java.io.IOException: Xid out of order. Got Xid 52 with err 0 expected Xid 53 for a packet with details: clientPath:null serverPath:null finished:false header:: 53,101  replyHeader:: 0,0,-4  request:: 12885502275,v{'/app1/controller,'/app1/config/changes},v{},v{'/app1/config/changes}  response:: null
> 	at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:996)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> unexpected error, closing socket connection and attempting reconnect 
> java.io.IOException: Nothing in the queue, but got 1
> 	at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:983)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:370)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1426)
> 	
> *Analysis:* 
> 1) First time Client fails to do SASL login due to network unreachable problem.
> 2017-03-29 10:03:59,377 | WARN  | [main-SendThread(192.168.130.8:24002)] | SASL configuration failed: javax.security.auth.login.LoginException: Network is unreachable (sendto failed) Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. | org.apache.zookeeper.ClientCnxn (ClientCnxn.java:1307) 
> 	Here the boolean saslLoginFailed becomes true.
> 2) After some time network connection is recovered and client is successully able to login but still the boolean saslLoginFailed is not reset to false. 
> 3) Now SASL negotiation between client and server start happening and during this time no user request will be sent. ( As the socket channel will be closed for write till sasl negotiation complets)
> 4) Now response from server for SASL packet will be processed by the client and client assumes that tunnelAuthInProgress() is finished ( method checks for saslLoginFailed boolean Since the boolean is true it assumes its done.) and tries to process the packet as a other packet and will result in above errors. 
> *Solution:*  Reset the saslLoginFailed boolean every time before client login



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)