You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2013/01/05 01:44:58 UTC

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544461#comment-13544461 ] 

Hudson commented on HBASE-7172:
-------------------------------

Integrated in HBase-0.94-security-on-Hadoop-23 #10 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/10/])
    HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky (Revision 1414975)

     Result = FAILURE
enis : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java

                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira