You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Mihai Claudiu Toader (Updated) (JIRA)" <ji...@apache.org> on 2012/03/17 01:58:38 UTC
[jira] [Updated] (ZOOKEEPER-1424) ZooKeeper will not allow a client
to delete a tree when it should allow it
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mihai Claudiu Toader updated ZOOKEEPER-1424:
--------------------------------------------
Attachment: zookeeper.log
Zookeeper server log with DEBUG enabled when the issue appears.
> ZooKeeper will not allow a client to delete a tree when it should allow it
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1424
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1424
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.2
> Environment: Linux ubuntu 11.10, Zookeeper 3.4.2, One server, Two Java clients
> Reporter: Mihai Claudiu Toader
> Attachments: zookeeper.log
>
>
> Hi all,
> While using zookeeper at midokura we hit an interesting bug in zookeeper. We did hit it sporadically
> while developing some functional tests so i had to build a test case for it.
> I finally created the test case and i think i narrowed down the conditions under which it happens.
> So i wanted to let you know my findings since they are somewhat troublesome.
> We need:
> - one running zookeeper server (didn't test that with a cluster)
> let's name this: server
> - one running zookeeper client that will create an ephemeral node under the tree created by the next client
> let's name this: the ephemeral client
> - one running zookeeper client that will create a persistent tree and try to delete that tree
> let's name this: the persistent client
> What needs to happen is this:
> step 1. - the server starts
> step 2. - the persistent client connects and creates a tree
> step 3. - the ephemeral client connects and adds a ephemeral node under the tree created by the persistent client
> step 4. - the persistent client will try to delete the tree recursively (without including the ephemeral node in the multi op
> step 5. - the ephemeral client crashes hard (the equivalent of kill -9)
> step 6. - the persistent client will try to delete the tree recursively again (and fail with NoEmptyNode even if when we list the node we don't see any childrens)
> - the zookeeper server needs to be restarted in order for this to work.
> The step 4 is critical in the sense that if we don't have that (there is no previous error trying to remove a tree) then the nexts steps behave as we would expect them to behave (aka pass).
> Also no amount of fiddling with zookeeper connection timeouts (between zookeeper and ephemeral node) will help.
>
> If the ephemeral client is shutdown properly it seems like everything will behave properly (even with step 4).
> The test code is available here:
> https://github.com/mtoadermido/play
> It needs an zookeepr 3.4.2 installed on the system (it uses the installed jars from the deb to spawn the zookeeper server).
> The entry point is https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java
> There is a lot of boiler plate since i didn't want it to be depending on stuff from midonet but the interesting part is the BlockingBug.main() method.
> It will launch a zookeeper process, an external ephemeral client process, and after that act as the second client.
> Available tweaks:
> - the zookeeper client timeout for the ephemeral client here:
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L56
> - the step 4 here (set to true / false):
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L69
> - the shutdown of the ephemeral client (soft aka clean shutdown, hard aka kill -9):
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L88
> The result is displayed depending on the fact that the final recursive deletion succeeded or not:
>
> We hit it !. The clear tree failed.
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L103
> "No error :("
> https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L99
> The conclusion is that the bug seems to be inside the zookeeper codebase and it's prone to being triggered by this
> particular usage of zookeeper combined with the misfortune of having to kill the ephemeral process hard.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira