You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by James Strachan <ja...@gmail.com> on 2008/07/22 21:09:01 UTC

things lock up when the client reconnects?

I wonder if anyone else has seen this recently; I've been trying to
make the WriteLock implementation survive server restarts (i.e.
reconnecting to another ZK server) with some success. See the latest
patch here...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

but I've found I can reliably get things to lock up. See the
WriteLockTest.java and change the workAroundClosingLastZNodeFails to
false and you should be able to run the test yourself and see things
lock up.

It seems like things lock up when waiting on a Packet being sent to
the transport. Sometimes I get a session timed out exception, so if I
see that I try and recreate the cxcn object which is maybe causing the
issue; I tried patching the ClientCnxn.SendThread.close() method to do
a cleanup() to wake up any blocked threads before closing (its in the
patch for ZOOKEEPER-78 which also depends on the patch for
ZOOKEEPER-84 BTW); am wondering if anyone has a better idea of dealing
with a session timeout?

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: things lock up when the client reconnects?

Posted by Benjamin Reed <br...@yahoo-inc.com>.
Just to clarify: reconnecting to another server will maintain the same
session or it will fail with session expires.

ben

Flavio Junqueira wrote:
> James, I'd like to clarify what exactly is the issue you're looking at. If
> you provide a list of ZooKeeper servers, then a client will try to reconnect
> to another ZooKeeper server upon a disconnection. Reconnecting to another
> server does not guarantee maintaining the same session, though. So, are you
> trying to guarantee that the session is still the same upon a reconnection?
> If so, I don't think you can do it by just changing the client, since the
> servers might have expired the old session.
>
> Cheers,
> -Flavio 
>
>   
>> -----Original Message-----
>> From: James Strachan [mailto:james.strachan@gmail.com]
>> Sent: Tuesday, July 22, 2008 9:09 PM
>> To: zookeeper-dev@hadoop.apache.org
>> Subject: things lock up when the client reconnects?
>>
>> I wonder if anyone else has seen this recently; I've been trying to
>> make the WriteLock implementation survive server restarts (i.e.
>> reconnecting to another ZK server) with some success. See the latest
>> patch here...
>> https://issues.apache.org/jira/browse/ZOOKEEPER-78
>>
>> but I've found I can reliably get things to lock up. See the
>> WriteLockTest.java and change the workAroundClosingLastZNodeFails to
>> false and you should be able to run the test yourself and see things
>> lock up.
>>
>> It seems like things lock up when waiting on a Packet being sent to
>> the transport. Sometimes I get a session timed out exception, so if I
>> see that I try and recreate the cxcn object which is maybe causing the
>> issue; I tried patching the ClientCnxn.SendThread.close() method to do
>> a cleanup() to wake up any blocked threads before closing (its in the
>> patch for ZOOKEEPER-78 which also depends on the patch for
>> ZOOKEEPER-84 BTW); am wondering if anyone has a better idea of dealing
>> with a session timeout?
>>
>> --
>> James
>> -------
>> http://macstrac.blogspot.com/
>>
>> Open Source Integration
>> http://open.iona.com
>>     
>
>   


Re: things lock up when the client reconnects?

Posted by James Strachan <ja...@gmail.com>.
BTW one other observation; when I use 3 clients in the same JVM (i.e.
3 separate instances of ZooKeeper to try simulate a set of different
processes) I find that each client receives an initial WatchEvent on
startup; then from that point on, only the first 2 clients receive
further watch events for the connection starting/stopping, despite me
closing the server down, waiting a while, restarting the server then
stopping it again etc.

I'm wondering if this is related to why the 3rd client seems to kinda
lock up; that its loosing connection watch events. There's nothing
hard coded somewhere that only allows 2 ZooKeeper clients per JVM or
anything is there? :)

I'm gonna have a look around and see if there's any nasty static
variables around or something... We could maybe do with some more
tests for multiple clients with failover etc.

Anyone else seen something like this?

2008/7/23 James Strachan <ja...@gmail.com>:
> 2008/7/22 Flavio Junqueira <fp...@yahoo-inc.com>:
>> James, I'd like to clarify what exactly is the issue you're looking at. If
>> you provide a list of ZooKeeper servers, then a client will try to reconnect
>> to another ZooKeeper server upon a disconnection. Reconnecting to another
>> server does not guarantee maintaining the same session, though. So, are you
>> trying to guarantee that the session is still the same upon a reconnection?
>> If so, I don't think you can do it by just changing the client, since the
>> servers might have expired the old session.
>
> I'm trying to test the WriteLock implementation in the case where the
> server dies and the client reconnects to another server.
> In the test case I'm just running one server, killing it, restarting
> it and trying to get the client to reconnect.
>
> The test case is WriteLockTest in this patch...
> https://issues.apache.org/jira/browse/ZOOKEEPER-78
>
> (unfortunately its not been committed yet so I can't easily point you
> at the code). Its very easy to run the test with different numbers of
> clients and see lockups at various places.
>
> The bizarre thing I've seen is that things do reconnect mostly fine
> (apart from the SessionExpiredException issue in one of the clients)
> https://issues.apache.org/jira/browse/ZOOKEEPER-84
>
> but a lockup often happens when trying to close down the ZooKeeper instance.
>
> When running the test case with 3 independent clients and one server;
> I tend to see the last client having a session expired and its often
> the one that locks up; but when running the test with more clients I
> see more lockups elsewhere.
>
> I just wondered if folks had seen similar lockups when you try
> restarting ZK servers?
>
> (I'm testing on OS X; this lockup could be timing related maybe).
>
> --
> James
> -------
> http://macstrac.blogspot.com/
>
> Open Source Integration
> http://open.iona.com
>



-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: things lock up when the client reconnects?

Posted by James Strachan <ja...@gmail.com>.
2008/7/22 Flavio Junqueira <fp...@yahoo-inc.com>:
> James, I'd like to clarify what exactly is the issue you're looking at. If
> you provide a list of ZooKeeper servers, then a client will try to reconnect
> to another ZooKeeper server upon a disconnection. Reconnecting to another
> server does not guarantee maintaining the same session, though. So, are you
> trying to guarantee that the session is still the same upon a reconnection?
> If so, I don't think you can do it by just changing the client, since the
> servers might have expired the old session.

I'm trying to test the WriteLock implementation in the case where the
server dies and the client reconnects to another server.
In the test case I'm just running one server, killing it, restarting
it and trying to get the client to reconnect.

The test case is WriteLockTest in this patch...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

(unfortunately its not been committed yet so I can't easily point you
at the code). Its very easy to run the test with different numbers of
clients and see lockups at various places.

The bizarre thing I've seen is that things do reconnect mostly fine
(apart from the SessionExpiredException issue in one of the clients)
https://issues.apache.org/jira/browse/ZOOKEEPER-84

but a lockup often happens when trying to close down the ZooKeeper instance.

When running the test case with 3 independent clients and one server;
I tend to see the last client having a session expired and its often
the one that locks up; but when running the test with more clients I
see more lockups elsewhere.

I just wondered if folks had seen similar lockups when you try
restarting ZK servers?

(I'm testing on OS X; this lockup could be timing related maybe).

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

RE: things lock up when the client reconnects?

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
James, I'd like to clarify what exactly is the issue you're looking at. If
you provide a list of ZooKeeper servers, then a client will try to reconnect
to another ZooKeeper server upon a disconnection. Reconnecting to another
server does not guarantee maintaining the same session, though. So, are you
trying to guarantee that the session is still the same upon a reconnection?
If so, I don't think you can do it by just changing the client, since the
servers might have expired the old session.

Cheers,
-Flavio 

> -----Original Message-----
> From: James Strachan [mailto:james.strachan@gmail.com]
> Sent: Tuesday, July 22, 2008 9:09 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: things lock up when the client reconnects?
> 
> I wonder if anyone else has seen this recently; I've been trying to
> make the WriteLock implementation survive server restarts (i.e.
> reconnecting to another ZK server) with some success. See the latest
> patch here...
> https://issues.apache.org/jira/browse/ZOOKEEPER-78
> 
> but I've found I can reliably get things to lock up. See the
> WriteLockTest.java and change the workAroundClosingLastZNodeFails to
> false and you should be able to run the test yourself and see things
> lock up.
> 
> It seems like things lock up when waiting on a Packet being sent to
> the transport. Sometimes I get a session timed out exception, so if I
> see that I try and recreate the cxcn object which is maybe causing the
> issue; I tried patching the ClientCnxn.SendThread.close() method to do
> a cleanup() to wake up any blocked threads before closing (its in the
> patch for ZOOKEEPER-78 which also depends on the patch for
> ZOOKEEPER-84 BTW); am wondering if anyone has a better idea of dealing
> with a session timeout?
> 
> --
> James
> -------
> http://macstrac.blogspot.com/
> 
> Open Source Integration
> http://open.iona.com