You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Artyom Kravchenko (JIRA)" <ji...@apache.org> on 2018/03/02 14:09:00 UTC

[jira] [Commented] (ZOOKEEPER-2219) ZooKeeper server should better handle SessionMovedException.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383602#comment-16383602 ] 

Artyom Kravchenko commented on ZOOKEEPER-2219:
----------------------------------------------

Hi 

I have faced with similar issue. I use:

server: Zookeeper server 3.4.11 (need to make cluster of solr instances), Solr Server 7.2.1 

client: CloudSolrClient (org.apache.solr:solr-solrj:7.1.0 and org.apache.zookeeper:zookeeper:3.4.10) 

 

After some time of work client falls off.

Server logs:
{code:java}
2018-02-28 15:04:04,131 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client attempting to establish new session at /10.100.53.1:52830
2018-02-28 15:04:04,133 [myid:2] - INFO  [CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810485 with negotiated timeout 10000 for client /10.100.48.1:52827
2018-02-28 15:04:04,135 [myid:2] - INFO  [CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810486 with negotiated timeout 10000 for client /10.100.80.1:52829
2018-02-28 15:04:04,136 [myid:2] - INFO  [CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810487 with negotiated timeout 10000 for client /10.100.53.1:52830
2018-02-28 15:04:08,085 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:52832
2018-02-28 15:04:08,085 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@883] - Processing ruok command from /127.0.0.1:52832
2018-02-28 15:04:08,086 [myid:2] - INFO  [Thread-281185:NIOServerCnxn@1044] - Closed socket connection for client /127.0.0.1:52832 (no session established for client)
--
--
	at java.lang.Thread.run(Thread.java:748)
2018-02-28 15:04:23,955 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /10.100.80.1:52829 which had sessionid 0x261c838af810486
2018-02-28 15:04:23,955 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x261c838af810487, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
	at java.lang.Thread.run(Thread.java:748)
--
--
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
	at java.lang.Thread.run(Thread.java:748)
2018-02-28 15:04:23,956 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /10.100.53.1:52830 which had sessionid 0x261c838af810487
2018-02-28 15:04:23,956 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x261c838af810485, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
--
--
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client attempting to establish new session at /10.100.48.1:52840
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /10.100.80.1:52841
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client attempting to renew session 0x261c838af810487 at /10.100.80.1:52841
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x261c838af810487
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client attempting to renew session 0x261c838af810486 at /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x261c838af810486
--
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client attempting to renew session 0x261c838af810486 at /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 0x261c838af810486
2018-02-28 15:04:36,587 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session 0x261c838af810487 for client /10.100.80.1:52841, probably expired
2018-02-28 15:04:36,587 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session 0x261c838af810486 for client /10.100.53.1:52842, probably expired
--
2018-02-28 15:04:36,587 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session 0x261c838af810486 for client /10.100.53.1:52842, probably expired
2018-02-28 15:04:36,592 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /10.100.80.1:52841 which had sessionid 0x261c838af810487
2018-02-28 15:04:36,592 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket connection for client /10.100.53.1:52842 which had sessionid 0x261c838af810486
2018-02-28 15:04:36,595 [myid:2] - INFO  [CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810488 with negotiated timeout 10000 for client /10.100.48.1:52840
2018-02-28 15:04:38,368 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /127.0.0.1:52844{code}
 

Client logs:
{code:java}
Unable to reconnect to ZooKeeper service, session 0x261c838af810487 has expired

{code}
Could anybody provide some update about this issue

 

 

> ZooKeeper server should better handle SessionMovedException.
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2219
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2219
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: zhihai xu
>            Priority: Major
>
> ZooKeeper server should better handle SessionMovedException.
> We hit the SessionMovedException. the following is the reason for the SessionMovedException we find:
> 1. ZK client tried to connect to Leader L. Network was very slow, so before leader processed the request, client disconnected.
> 2. Client then re-connected to Follower F reusing the same session ID. It was successful.
> 3. The request in step 1 went into leader. Leader processed it and invalidated the connection created in step 2. But client didn't know the connection it used is invalidated.
> 4. Client got SessionMovedException when it used the connection invalidated by leader for any ZooKeeper operation.
> The following are logs: c045dkh is the Leader, c470udy is the Follower and the sessionID is 0x14be28f50f4419d.
> 1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and timeout after 10/3 seconds.
> logs from ZK client 
> {code}
> 2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6670ms for sessionid 0x14be28f50f4419d, closing socket connection and attempting reconnect
> 015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to c045dkh/?.?.?.67:2181, initiating session
> 2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 3336ms for sessionid 0x14be28f50f4419d, closing socket connection and attempting reconnect
> {code}
> 2. ZK client initiated session to Follower successfully at 2015-03-16 10:59:44,688
> logs from ZK client
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to c470udy/?.?.?.65:2181, initiating session
> 2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server c470udy/?.?.?.65:2181, sessionid = 0x14be28f50f4419d, negotiated timeout = 10000
> {code}
> logs from ZK Follower server
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner: Revalidating client: 0x14be28f50f4419d
> 2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14be28f50f4419d with negotiated timeout 10000 for client /?.?.?.65:42777
> {code}
> 3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is sent by Client at 2015-03-16 10:59:40,735, right after session was established, it close the socket connection/session because client was already disconnected due to timeout.
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14be28f50f4419d with negotiated timeout 10000 for client /?.?.?.65:50271
> 2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing close of session 0x14be28f50f4419d due to java.io.IOException: Broken pipe
> 2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /?.?.?.65:50271 which had sessionid 0x14be28f50f4419d
> {code}
> 4. Client got SessionMovedException at 2015-03-16 10:59:45,693
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,693 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,695 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,700 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode = Session moved
> {code}
> 5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the socket connection between ZK client and ZK Follower is closed at 2015-03-16 10:59:45,715 after session termination.
> logs from ZK Leader server:
> {code}
> 2015-03-16 10:59:45,710 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x14be28f50f4419d
> {code}
> logs from ZK Follower server:
> {code}
> 2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /?.?.?.65:42777 which had sessionid 0x14be28f50f4419d
> {code}
> It looks like Zk client is out-of-sync with ZK server.
> My question is how ZK client can recover from this error. It looks like the ZK Client won't be disconnected from Follower until session is closed and any ZooKeeper operation will fail with SessionMovedException before session is closed.
> Also since ZK Leader already closed the socket connection/session to ZK Client at step 3, why it still reject the ZooKeeper operation from client with SessionMovedException. Will it be better to endorse the session/connection between ZK client and ZK Follower? This seems like a bug to me. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)