You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Yuriy Lopotun <yu...@gmail.com> on 2015/04/23 00:55:38 UTC
Zookeeper-based discovery provider: infinite re-connect loop after
server restart
Hi guys,
In our client-server OSGI application we are using ECF Zookeeper-based
discovery provider for remote services discovery (based on Zookeeper
v.3.3.6).
In a standalone mode the plugin opens a dedicated Zookeeper connection from
the client to each of the servers.
When testing the application resiliency, we noticed that when we restart
the server, the connection never gets re-established. In the server logs I
found the following:
2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from /
10.36.64.250:53022
2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG
org.apac.zook.serv.NIOServerCnxn - Session establishment request from
client /10.36.64.250:53022 client's lastZxid is 0x8
2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Refusing session request for client /
10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client must
try another server
2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client /
10.36.64.250:53022 (no session established for client)
As far as I understood – this is an expected behaviour, since the server
(due to restart) cleaned up its DB and reset the transaction id.
The problem in this case is that the client session keeps trying
re-connecting to this only server, which causes an infinite loop:
2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Opening
socket connection to server ca-rd-mbernard.miranda.com/10.36.64.250:2001
2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Socket
connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001,
initiating session
2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn - Session
establishment request sent on ca-rd-mbernard.miranda.com/10.36.64.250:2001
2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread(
ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Unable
to read additional data from server sessionid 0x14ce32e178c0002, likely
server has closed socket, closing socket connection and attempting reconnect
Again, I think this is a correct behaviour in case of several servers. But
in our case – it’s always 1.
So, I wanted to ask you for a suggestion: what you think we can do in this
case to achieve automatic reconnect.
I thought, maybe we can close the connection in case of such exception if
there is only 1 server instead of retrying? Maybe this enhancement is
already done in more recent versions and could be back-ported?
Thanks,
Yuriy
Re: Zookeeper-based discovery provider: infinite re-connect loop
after server restart
Posted by Yuriy Lopotun <yu...@gmail.com>.
Looks like there's an opened bug for the described issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-832
There was some discussion in the comments but looks like the best solution
hasn't been found yet.
Yuriy
2015-04-22 18:55 GMT-04:00 Yuriy Lopotun <yu...@gmail.com>:
> Hi guys,
>
>
>
> In our client-server OSGI application we are using ECF Zookeeper-based
> discovery provider for remote services discovery (based on Zookeeper
> v.3.3.6).
>
> In a standalone mode the plugin opens a dedicated Zookeeper connection
> from the client to each of the servers.
>
>
> When testing the application resiliency, we noticed that when we restart
> the server, the connection never gets re-established. In the server logs I
> found the following:
>
> 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
> org.apac.zook.serv.NIOServerCnxn - Accepted socket connection from /
> 10.36.64.250:53022
>
> 2015-04-22 18:20:53,763 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] DEBUG
> org.apac.zook.serv.NIOServerCnxn - Session establishment request from
> client /10.36.64.250:53022 client's lastZxid is 0x8
>
> 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
> org.apac.zook.serv.NIOServerCnxn - Refusing session request for client /
> 10.36.64.250:53022 as it has seen zxid 0x8 our last zxid is 0x7 client
> must try another server
>
> 2015-04-22 18:20:53,764 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2001] INFO
> org.apac.zook.serv.NIOServerCnxn - Closed socket connection for client /
> 10.36.64.250:53022 (no session established for client)
>
>
>
> As far as I understood – this is an expected behaviour, since the server
> (due to restart) cleaned up its DB and reset the transaction id.
>
>
> The problem in this case is that the client session keeps trying
> re-connecting to this only server, which causes an infinite loop:
>
> 2015-04-22 18:21:02,760 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn -
> Opening socket connection to server
> ca-rd-mbernard.miranda.com/10.36.64.250:2001
>
> 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Socket
> connection established to ca-rd-mbernard.miranda.com/10.36.64.250:2001,
> initiating session
>
> 2015-04-22 18:21:02,761 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] DEBUG org.apac.zook.ClientCnxn -
> Session establishment request sent on
> ca-rd-mbernard.miranda.com/10.36.64.250:2001
>
> 2015-04-22 18:21:02,762 [pool-2-thread-3-SendThread(
> ca-rd-mbernard.miranda.com:2001)] INFO org.apac.zook.ClientCnxn - Unable
> to read additional data from server sessionid 0x14ce32e178c0002, likely
> server has closed socket, closing socket connection and attempting reconnect
>
>
>
> Again, I think this is a correct behaviour in case of several servers. But
> in our case – it’s always 1.
>
> So, I wanted to ask you for a suggestion: what you think we can do in this
> case to achieve automatic reconnect.
>
> I thought, maybe we can close the connection in case of such exception if
> there is only 1 server instead of retrying? Maybe this enhancement is
> already done in more recent versions and could be back-ported?
>
>
>
> Thanks,
>
> Yuriy
>