You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Patrick Hunt (Commented) (JIRA)" <ji...@apache.org> on 2012/03/16 22:31:35 UTC

[jira] [Commented] (ZOOKEEPER-1424) ZooKeeper will not allow a client to delete a tree when it should allow it

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231650#comment-13231650 ] 

Patrick Hunt commented on ZOOKEEPER-1424:
-----------------------------------------

Mihai thanks for the report. Would it be possible for you to run this with DEBUG logging turned on for the server and attach the server logs from after the test run?
                
> ZooKeeper will not allow a client to delete a tree when it should allow it
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1424
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1424
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.2
>         Environment: Linux ubuntu 11.10, Zookeeper 3.4.2, One server, Two Java clients
>            Reporter: Mihai Claudiu Toader
>
> Hi all, 
> While using zookeeper at midokura we hit an interesting bug in zookeeper. We did hit it sporadically 
> while developing some functional tests so i had to build a test case for it. 
> I finally created the test case and i think i narrowed down the conditions under which it happens. 
> So i wanted to let you know my findings since they are somewhat troublesome. 
> We need:
>   - one running zookeeper server (didn't test that with a cluster)
>       let's name this: server
>   - one running zookeeper client that will create an ephemeral node under the tree created by the next client
>       let's name this: the ephemeral client
>   - one running zookeeper client that will create a persistent tree and try to delete that tree
>       let's name this: the persistent client
> What needs to happen is this:
>  step 1. - the server starts
>  step 2. - the persistent client connects and creates a tree
>  step 3. - the ephemeral client connects and adds a ephemeral node under the tree created by the persistent client
>  step 4. - the persistent client will try to delete the tree recursively (without including the ephemeral node in the multi op
>  step 5. - the ephemeral client crashes hard (the equivalent of kill -9)
>  step 6. - the persistent client will try to delete the tree recursively again (and fail with NoEmptyNode even if when we list the node we don't see any childrens)
>     - the zookeeper server needs to be restarted in order for this to work. 
> The step 4 is critical in the sense that if we don't have that (there is no previous error trying to remove a tree) then the nexts steps behave as we would expect them to behave (aka pass). 
> Also no amount of fiddling with zookeeper connection timeouts (between zookeeper and ephemeral node) will help. 
>  
> If the ephemeral client is shutdown properly it seems like everything will behave properly (even with step 4). 
> The test code is available here:
>    https://github.com/mtoadermido/play
> It needs an zookeepr 3.4.2 installed on the system (it uses the installed jars from the deb to spawn the zookeeper server).
> The entry point is https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java
> There is a lot of boiler plate since i didn't want it to be depending on stuff from midonet but the interesting part is the BlockingBug.main() method. 
> It will launch a zookeeper process, an external ephemeral client process, and after that act as the second client. 
> Available tweaks:
> - the zookeeper client timeout for the ephemeral client here: 
>   https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L56
> - the step 4 here (set to true / false):
>  https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L69
> - the shutdown of the ephemeral client (soft aka clean shutdown, hard aka kill -9):
>  https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L88
> The result is displayed depending on the fact that the final recursive deletion succeeded or not:
>    
> We hit it !. The clear tree failed.
>    https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L103
> "No error :("  
>    https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L99
> The conclusion is that the bug seems to be inside the zookeeper codebase and it's prone to being triggered by this 
> particular usage of zookeeper combined with the misfortune of having to kill the ephemeral process hard. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira