You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2012/11/16 01:59:12 UTC

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498498#comment-13498498 ] 

stack commented on HBASE-7172:
------------------------------

Looks fine to me.  If it passes hadoopqa, go ahead commit.
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira