You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Josh Pattiz (JIRA)" <ji...@apache.org> on 2018/07/14 06:33:00 UTC

[jira] [Commented] (CURATOR-318) Threads may return different boolean values when entering same double barrier

    [ https://issues.apache.org/jira/browse/CURATOR-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544084#comment-16544084 ] 

Josh Pattiz commented on CURATOR-318:
-------------------------------------

I added a test for the problem. I can do a PR of a simple fix of just deleting a client's entry into the barrier if it timeouts trying to enter, but that's not really a perfect fix and I think a perfect fix would probably require a decent amount of reworking on how the barrier functions. I think that fix is probably pretty reasonable in most cases though. Right now the double barrier basically enters a broken state if any client times out trying to enter.

> Threads may return different boolean values when entering same double barrier
> -----------------------------------------------------------------------------
>
>                 Key: CURATOR-318
>                 URL: https://issues.apache.org/jira/browse/CURATOR-318
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.10.0
>            Reporter: Shiliang Cao
>            Priority: Major
>         Attachments: DoubleBarrierTimeoutTest.java, TestBarrierBug.java
>
>
> To my understanding, when all threads are trying enter an barrier, they should all success or fail, which means their return values should be the same.
> But actually they may get different return values in this situation (reproduce steps):
> 0. Some preparing works such as running a zk server, basic curator connecting codes;
> 1. Prepare 3 threads: thread1/ thread2/ thread3;
> 2. Thread1 sleep 20 seconds then enter barrier, thread2 and thread3 try to enter barrier right now, with timeout value set to 5 seconds;
> 3. Result: thread2 and thread3 returned false due to timeout as expected, but thread1 (the sleeping one) just return true, which I think should be false too.
> Possible root cause as I observed via zkCli:
> When thread1 and thread2 enter methods returned, their path nodes remained, so when thread3 came, it just think other threads are still waiting, so it just created the ready node and return with true.
> If this is not by design, it should be a design defect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)