You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Mihai Claudiu Toader (Updated) (JIRA)" <ji...@apache.org> on 2012/03/17 01:58:38 UTC

[jira] [Updated] (ZOOKEEPER-1424) ZooKeeper will not allow a client to delete a tree when it should allow it

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mihai Claudiu Toader updated ZOOKEEPER-1424:
--------------------------------------------

    Attachment: zookeeper.log

Zookeeper server log with DEBUG enabled when the issue appears.

                
> ZooKeeper will not allow a client to delete a tree when it should allow it
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1424
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1424
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.2
>         Environment: Linux ubuntu 11.10, Zookeeper 3.4.2, One server, Two Java clients
>            Reporter: Mihai Claudiu Toader
>         Attachments: zookeeper.log
>
>
> Hi all, 
> While using zookeeper at midokura we hit an interesting bug in zookeeper. We did hit it sporadically 
> while developing some functional tests so i had to build a test case for it. 
> I finally created the test case and i think i narrowed down the conditions under which it happens. 
> So i wanted to let you know my findings since they are somewhat troublesome. 
> We need:
>   - one running zookeeper server (didn't test that with a cluster)
>       let's name this: server
>   - one running zookeeper client that will create an ephemeral node under the tree created by the next client
>       let's name this: the ephemeral client
>   - one running zookeeper client that will create a persistent tree and try to delete that tree
>       let's name this: the persistent client
> What needs to happen is this:
>  step 1. - the server starts
>  step 2. - the persistent client connects and creates a tree
>  step 3. - the ephemeral client connects and adds a ephemeral node under the tree created by the persistent client
>  step 4. - the persistent client will try to delete the tree recursively (without including the ephemeral node in the multi op
>  step 5. - the ephemeral client crashes hard (the equivalent of kill -9)
>  step 6. - the persistent client will try to delete the tree recursively again (and fail with NoEmptyNode even if when we list the node we don't see any childrens)
>     - the zookeeper server needs to be restarted in order for this to work. 
> The step 4 is critical in the sense that if we don't have that (there is no previous error trying to remove a tree) then the nexts steps behave as we would expect them to behave (aka pass). 
> Also no amount of fiddling with zookeeper connection timeouts (between zookeeper and ephemeral node) will help. 
>  
> If the ephemeral client is shutdown properly it seems like everything will behave properly (even with step 4). 
> The test code is available here:
>    https://github.com/mtoadermido/play
> It needs an zookeepr 3.4.2 installed on the system (it uses the installed jars from the deb to spawn the zookeeper server).
> The entry point is https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java
> There is a lot of boiler plate since i didn't want it to be depending on stuff from midonet but the interesting part is the BlockingBug.main() method. 
> It will launch a zookeeper process, an external ephemeral client process, and after that act as the second client. 
> Available tweaks:
> - the zookeeper client timeout for the ephemeral client here: 
>   https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L56
> - the step 4 here (set to true / false):
>  https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L69
> - the shutdown of the ephemeral client (soft aka clean shutdown, hard aka kill -9):
>  https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L88
> The result is displayed depending on the fact that the final recursive deletion succeeded or not:
>    
> We hit it !. The clear tree failed.
>    https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L103
> "No error :("  
>    https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L99
> The conclusion is that the bug seems to be inside the zookeeper codebase and it's prone to being triggered by this 
> particular usage of zookeeper combined with the misfortune of having to kill the ephemeral process hard. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira