You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Jordan Zimmerman (JIRA)" <ji...@apache.org> on 2016/12/21 16:29:58 UTC

[jira] [Commented] (CURATOR-367) Curator may deliver RECONNECTED before LOST in case of session expiry

    [ https://issues.apache.org/jira/browse/CURATOR-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767477#comment-15767477 ] 

Jordan Zimmerman commented on CURATOR-367:
------------------------------------------

I recall we were just talking about this somewhere - either email or a different issue. I'll search around.

> Curator may deliver RECONNECTED before LOST in case of session expiry
> ---------------------------------------------------------------------
>
>                 Key: CURATOR-367
>                 URL: https://issues.apache.org/jira/browse/CURATOR-367
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.11.1
>            Reporter: Zoltan Szekeres
>
> h2. Behaviour:
> We saw our code blocked at client.blockUntilConnected() after reconnected on session expiry.
> h2. Possible reason:
> After receiving a session expired event ConnectionState first resets the connection then notifies the parent watchers, where the CuratorEvent is created. In this case it seems the execution of the first zookeeper event thread was delayed before calling the parent watchers. Meanwhile a new zookeeper event thread was created due to calling reset and this new thread sent the SyncConnected event earlier than SessionExpired was sent to parent watchers in the first thread. This resulted ConnectionStateListener instances seeing the RECONNECTED before the LOST.
> h2. Logs:
> 2016-11-17T20:23:28.527Z [Thread-0-SendThread(]  INFO              ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.535Z [Thread-0-SendThread(]  INFO              ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.576Z [Thread-0-SendThread(]  INFO              ClientCnxn: Unable to reconnect to ZooKeeper service, session 0xc585ba1e7b6adc2 has expired, closing socket connection
> 2016-11-17T20:23:28.576Z [Thread-0-EventThread]  WARN         ConnectionState: Session expired event received
> 2016-11-17T20:23:28.673Z [Thread-0-EventThread]  INFO               ZooKeeper: Initiating client connection, connectString=_ sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@6ddf3f9d
> 2016-11-17T20:23:28.691Z [Thread-0-SendThread(]  INFO              ClientCnxn: Opening socket connection to server _
> 2016-11-17T20:23:28.693Z [Thread-0-SendThread(]  INFO              ClientCnxn: Socket connection established to _, initiating session
> 2016-11-17T20:23:28.701Z [Thread-0-SendThread(]  INFO              ClientCnxn: Session establishment complete on server _, sessionid = 0x2585ba1e69ffeca, negotiated timeout = 30000
> 2016-11-17T20:23:28.701Z [Thread-0-EventThread]  INFO  ConnectionStateManager: State change: RECONNECTED
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO  ConnectionStateManager: State change: LOST
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO              ClientCnxn: EventThread shut down
> h2. Reproduction:
> I was only able to reproduce the behaviour by adding artificial Thread.sleep in ConnectionState#process before calling the parent watchers if the event is session expired.
> {code:title=ConnectionState#process}
> @Override
> public void process(WatchedEvent event)
> {
> 	if ( LOG_EVENTS )
> 	{
> 		log.debug("ConnectState watcher: " + event);
> 	}
> 	if ( event.getType() == Watcher.Event.EventType.None )
> 	{
> 		boolean wasConnected = isConnected.get();
> 		boolean newIsConnected = checkState(event.getState(), wasConnected);
> 		if ( newIsConnected != wasConnected )
> 		{
> 			isConnected.set(newIsConnected);
> 			connectionStartMs = System.currentTimeMillis();
> 		}
> 	}
> 	if (event.getState() == KeeperState.Expired)
> 	{
> 		System.err.println("::> sleep in ConnectionState#process");
> 		try {
> 			Thread.sleep(1000);
> 		} catch (InterruptedException e) {}
> 	}
> 	for ( Watcher parentWatcher : parentWatchers )
> 	{
> 		TimeTrace timeTrace = new TimeTrace("connection-state-parent-process", tracer.get());
> 		parentWatcher.process(event);
> 		timeTrace.commit();
> 	}
> }
> {code}
> h2. Some ideas for fix:
>  * Add the event handling and calling parent watchers into a synchronized block.
>  * Change the order of handling watched event and calling parent watchers (I'm not aware of the behaviour implications of this).
>  * Move only calling reset to the end of the method "process".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)