You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by James Strachan <ja...@gmail.com> on 2008/07/23 15:28:12 UTC

when should a SessionExpiredException occur?

Am just wondering if I've hit this due to some other bug. I thought ZK
did keep-alive pings to ensure each client is alive and its session
does not expire? Or does the client have to explicitly keep calling
some method on the ZooKeeper interface to ensure a steady flow of
packets to the ZK server to keep it alive?

The test case WriteLockTest in the patch for ZOOKEEPER-78 (the
WriteLock) can always reproduce a SessionExpiredException when using 3
clients (its always the 3rd session that expires).

Now when a SessionExpiredException occurs, any recipe/protocol has to
be able to deal with it; so the ZOOKEEPER-84 issue is still valid
IMHO. But I'm wondering if in my test case it shouldn't be happening;
as I've got 3 clients and a server all in the same JVM and the JVM
isn't locked or pegged nor do the TCP sockets fail AFAIK.

So I just thought I'd ask; are the keep alive packets used by default?
If they are then maybe they are not sent very frequently or something?

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: when should a SessionExpiredException occur?

Posted by Patrick Hunt <ph...@apache.org>.
I was going to refer you guys to
https://issues.apache.org/jira/browse/ZOOKEEPER-63
but I noticed in the comments James beat me to it! :-)

Ben, you had an idea for how to address 63, please add a comment (I 
think it was to set the state to closed before sending the disconnect 
request to the server, but please update)

James, go ahead and create a new Jira issue for this 
SessionExpiredException being thrown. As you can reproduce it feel free 
to assign to yourself and work with the rest of the team to resolve.

Thanks!

Patrick

James Strachan wrote:
> 2008/7/23 James Strachan <ja...@gmail.com>:
>> 2008/7/23 Benjamin Reed <br...@yahoo-inc.com>:
>>> SessionExpiredExceptions should be extremely rare. Basically they should only
>>> happen if a machine goes down (of course that would mean no exception would
>>> actually get generated since the client is dead :) or a network partition
>>> occurs.
>>>
>>> Having said that we seem to have a bug that cause SessionExpiredExceptions
>>> when nothing bad has happened. The bug must be in the heart beat code (we do
>>> them automatically, so the client shouldn't have to worry about it). If you
>>> can reproduce it well, it would greatly help to track down the bug! Can you
>>> send me the code to reproduce the problem?
>> Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
>> currently dependent on the ZOOKEEPER-84 patch as well (though given
>> your recent comment I'm gonna refactor the code to not require a
>> ZooKeeper change :)
>>
>> I'll ping the list when I've refactored the test case to not require
>> the ZOOKEEPER-84 change.
> 
> I've just updated the patch on ZOOKEEPER-78 to avoid the dependency on
> ZOOKEEPER-84. It now uses a ZooKeeperFacade class which wraps up the
> creation of the ZooKeeper - and recreation of it if a
> SessionExpiredException is received.
> 
> The test case currently hangs there...
> 
>     [junit] "main" prio=5 tid=0x01001710 nid=0xb0801000 in
> Object.wait() [0xb07ff000..0xb0800148]
>     [junit]     at java.lang.Object.wait(Native Method)
>     [junit]     - waiting on <0x096105e0> (a
> org.apache.zookeeper.ClientCnxn$Packet)
>     [junit]     at java.lang.Object.wait(Object.java:474)
>     [junit]     at
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
>     [junit]     - locked <0x096105e0> (a org.apache.zookeeper.ClientCnxn$Packet)
>     [junit]     at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
>     [junit]     - locked <0x0bd54108> (a org.apache.zookeeper.ZooKeeper)
>     [junit]     at
> org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
>     [junit]     at
> org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
>     [junit]     at junit.framework.TestCase.runBare(TestCase.java:140)
>     [junit]     at junit.framework.TestResult$1.protect(TestResult.java:110)
>     [junit]     at junit.framework.TestResult.runProtected(TestResult.java:128)
>     [junit]     at junit.framework.TestResult.run(TestResult.java:113)
>     [junit]     at junit.framework.TestCase.run(TestCase.java:124)
>     [junit]     at junit.framework.TestSuite.runTest(TestSuite.java:232)
>     [junit]     at junit.framework.TestSuite.run(TestSuite.java:227)
>     [junit]     at
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
>     [junit]     at
> junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
>     [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
>     [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
>     [junit]     at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
> 
> 
> basically the 3rd ZooKeeper client cannot close down; it just hangs in
> the close() method.
> 
> (BTW it might be nice to avoid the close() method waiting forever - it
> might as well wait, say, 10 seconds then just close anyway).
> 
> Though now I've refactored the code to avoid the patch on ZooKeeper to
> deal with reconnecting when a SessionExpiredException occurs, I don't
> seem to get any session expired exceptions :). I'm starting to wonder
> if its maybe related to old persistent data on disk causing the
> exception?
> 
> I still get the strange lack of Watch Events on the 3rd client though
> and the hang on closing (if
> WriteLockTest,workAroundClosingLastZNodeFails is set to false - I've
> hacked the test to pass by default).
> 

Re: when should a SessionExpiredException occur?

Posted by James Strachan <ja...@gmail.com>.
2008/7/23 James Strachan <ja...@gmail.com>:
> 2008/7/23 Benjamin Reed <br...@yahoo-inc.com>:
>> SessionExpiredExceptions should be extremely rare. Basically they should only
>> happen if a machine goes down (of course that would mean no exception would
>> actually get generated since the client is dead :) or a network partition
>> occurs.
>>
>> Having said that we seem to have a bug that cause SessionExpiredExceptions
>> when nothing bad has happened. The bug must be in the heart beat code (we do
>> them automatically, so the client shouldn't have to worry about it). If you
>> can reproduce it well, it would greatly help to track down the bug! Can you
>> send me the code to reproduce the problem?
>
> Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
> currently dependent on the ZOOKEEPER-84 patch as well (though given
> your recent comment I'm gonna refactor the code to not require a
> ZooKeeper change :)
>
> I'll ping the list when I've refactored the test case to not require
> the ZOOKEEPER-84 change.

I've just updated the patch on ZOOKEEPER-78 to avoid the dependency on
ZOOKEEPER-84. It now uses a ZooKeeperFacade class which wraps up the
creation of the ZooKeeper - and recreation of it if a
SessionExpiredException is received.

The test case currently hangs there...

    [junit] "main" prio=5 tid=0x01001710 nid=0xb0801000 in
Object.wait() [0xb07ff000..0xb0800148]
    [junit]     at java.lang.Object.wait(Native Method)
    [junit]     - waiting on <0x096105e0> (a
org.apache.zookeeper.ClientCnxn$Packet)
    [junit]     at java.lang.Object.wait(Object.java:474)
    [junit]     at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:822)
    [junit]     - locked <0x096105e0> (a org.apache.zookeeper.ClientCnxn$Packet)
    [junit]     at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:329)
    [junit]     - locked <0x0bd54108> (a org.apache.zookeeper.ZooKeeper)
    [junit]     at
org.apache.zookeeper.protocols.ZooKeeperFacade.close(ZooKeeperFacade.java:99)
    [junit]     at
org.apache.zookeeper.protocols.WriteLockTest.tearDown(WriteLockTest.java:146)
    [junit]     at junit.framework.TestCase.runBare(TestCase.java:140)
    [junit]     at junit.framework.TestResult$1.protect(TestResult.java:110)
    [junit]     at junit.framework.TestResult.runProtected(TestResult.java:128)
    [junit]     at junit.framework.TestResult.run(TestResult.java:113)
    [junit]     at junit.framework.TestCase.run(TestCase.java:124)
    [junit]     at junit.framework.TestSuite.runTest(TestSuite.java:232)
    [junit]     at junit.framework.TestSuite.run(TestSuite.java:227)
    [junit]     at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
    [junit]     at
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:36)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
    [junit]     at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)


basically the 3rd ZooKeeper client cannot close down; it just hangs in
the close() method.

(BTW it might be nice to avoid the close() method waiting forever - it
might as well wait, say, 10 seconds then just close anyway).

Though now I've refactored the code to avoid the patch on ZooKeeper to
deal with reconnecting when a SessionExpiredException occurs, I don't
seem to get any session expired exceptions :). I'm starting to wonder
if its maybe related to old persistent data on disk causing the
exception?

I still get the strange lack of Watch Events on the 3rd client though
and the hang on closing (if
WriteLockTest,workAroundClosingLastZNodeFails is set to false - I've
hacked the test to pass by default).

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: when should a SessionExpiredException occur?

Posted by James Strachan <ja...@gmail.com>.
2008/7/23 Benjamin Reed <br...@yahoo-inc.com>:
> SessionExpiredExceptions should be extremely rare. Basically they should only
> happen if a machine goes down (of course that would mean no exception would
> actually get generated since the client is dead :) or a network partition
> occurs.
>
> Having said that we seem to have a bug that cause SessionExpiredExceptions
> when nothing bad has happened. The bug must be in the heart beat code (we do
> them automatically, so the client shouldn't have to worry about it). If you
> can reproduce it well, it would greatly help to track down the bug! Can you
> send me the code to reproduce the problem?

Its the test case WriteLockTest in the patch for ZOOKEEPER-78 which is
currently dependent on the ZOOKEEPER-84 patch as well (though given
your recent comment I'm gonna refactor the code to not require a
ZooKeeper change :)

I'll ping the list when I've refactored the test case to not require
the ZOOKEEPER-84 change.

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: when should a SessionExpiredException occur?

Posted by Benjamin Reed <br...@yahoo-inc.com>.
SessionExpiredExceptions should be extremely rare. Basically they should only 
happen if a machine goes down (of course that would mean no exception would 
actually get generated since the client is dead :) or a network partition 
occurs.

Having said that we seem to have a bug that cause SessionExpiredExceptions 
when nothing bad has happened. The bug must be in the heart beat code (we do 
them automatically, so the client shouldn't have to worry about it). If you 
can reproduce it well, it would greatly help to track down the bug! Can you 
send me the code to reproduce the problem?

thanx
ben

On Wednesday 23 July 2008 06:28:12 James Strachan wrote:
> Am just wondering if I've hit this due to some other bug. I thought ZK
> did keep-alive pings to ensure each client is alive and its session
> does not expire? Or does the client have to explicitly keep calling
> some method on the ZooKeeper interface to ensure a steady flow of
> packets to the ZK server to keep it alive?
>
> The test case WriteLockTest in the patch for ZOOKEEPER-78 (the
> WriteLock) can always reproduce a SessionExpiredException when using 3
> clients (its always the 3rd session that expires).
>
> Now when a SessionExpiredException occurs, any recipe/protocol has to
> be able to deal with it; so the ZOOKEEPER-84 issue is still valid
> IMHO. But I'm wondering if in my test case it shouldn't be happening;
> as I've got 3 clients and a server all in the same JVM and the JVM
> isn't locked or pegged nor do the TCP sockets fail AFAIK.
>
> So I just thought I'd ask; are the keep alive packets used by default?
> If they are then maybe they are not sent very frequently or something?