You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2012/11/16 01:39:11 UTC

[jira] [Created] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Enis Soztutar created HBASE-7172:
------------------------------------

             Summary: TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
                 Key: HBASE-7172
                 URL: https://issues.apache.org/jira/browse/HBASE-7172
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.96.0, 0.94.4
            Reporter: Enis Soztutar
            Assignee: Enis Soztutar


TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 

The reason is a rare race condition, which somehow does not happen that much when the whole class is run.

The sequence of events is smt like this:
 - we create 1 log file to split
 - we call splitLogDistributed() in its own thread. 
 - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
 - we delete the task znode from zk
 - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
 - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
 - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 

This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-7172:
---------------------------------

    Attachment: hbase-7172_v1.patch

Attaching a simple patch.
We should only return from the wait loop if there are no remaining tasks and no znodes. If remainingInZK == 0 && and actual > 0, then that task will eventually be resubmitted. I think this can only happen if we somehow miss to setup the zk watchers. 
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498575#comment-13498575 ] 

Enis Soztutar commented on HBASE-7172:
--------------------------------------

Ops, I've found some more flaky tests: 
{code}
java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:92)
	at org.junit.Assert.assertTrue(Assert.java:43)
	at org.junit.Assert.assertTrue(Assert.java:54)
	at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:148)
	at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:128)
	at org.apache.hadoop.hbase.master.TestSplitLogManager.testOrphanTaskAcquisition(TestSplitLogManager.java:214)
{code}

{code}
java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:92)
	at org.junit.Assert.assertTrue(Assert.java:43)
	at org.junit.Assert.assertTrue(Assert.java:54)
	at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:148)
	at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:128)
	at org.apache.hadoop.hbase.master.TestSplitLogManager.testMultipleResubmits(TestSplitLogManager.java:278)
{code}

I think we can just increase the timeouts a la HBASE-7165. I'll do a v2 patch. 
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-7172:
---------------------------------

    Status: Patch Available  (was: Open)
    
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506089#comment-13506089 ] 

Hudson commented on HBASE-7172:
-------------------------------

Integrated in HBase-0.94 #604 (See [https://builds.apache.org/job/HBase-0.94/604/])
    HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky (Revision 1414975)

     Result = FAILURE
enis : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java

                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-7172:
---------------------------------

    Status: Patch Available  (was: Open)
    
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506081#comment-13506081 ] 

Hudson commented on HBASE-7172:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #278 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/278/])
    HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky (Revision 1414973)

     Result = FAILURE
enis : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java

                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498498#comment-13498498 ] 

stack commented on HBASE-7172:
------------------------------

Looks fine to me.  If it passes hadoopqa, go ahead commit.
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498553#comment-13498553 ] 

Lars Hofhansl commented on HBASE-7172:
--------------------------------------

+1
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-7172:
---------------------------------

    Attachment: hbase-7172_v2.patch
                hbase-7172_v2-0.94.patch

Attaching v2 patches, which hopefully fixes the remaining flaky tests. 
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506086#comment-13506086 ] 

Hudson commented on HBASE-7172:
-------------------------------

Integrated in HBase-TRUNK #3576 (See [https://builds.apache.org/job/HBase-TRUNK/3576/])
    HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky (Revision 1414973)

     Result = FAILURE
enis : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java

                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498531#comment-13498531 ] 

Hadoop QA commented on HBASE-7172:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12553720/hbase-7172_v1.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 2.0 profile.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 93 warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 16 new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3349//console

This message is automatically generated.
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-7172:
---------------------------------

    Status: Open  (was: Patch Available)
    
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505931#comment-13505931 ] 

Lars Hofhansl commented on HBASE-7172:
--------------------------------------

+1 on patch
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HBASE-7172:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.94.4
                   0.96.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

I've committed this to trunk and 0.94. Thanks Lars and Stack for reviews.
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.96.0, 0.94.4
>
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498594#comment-13498594 ] 

Lars Hofhansl commented on HBASE-7172:
--------------------------------------

I should have probably done that in HBASE-7165 (on the other hand I did not see this test failing in recent 0.94 builds). +1 on increasing TOs there as well.
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7172) TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505990#comment-13505990 ] 

stack commented on HBASE-7172:
------------------------------

+1
                
> TestSplitLogManager.testVanishingTaskZNode() fails when run individually and is flaky
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-7172
>                 URL: https://issues.apache.org/jira/browse/HBASE-7172
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0, 0.94.4
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: hbase-7172_v1.patch, hbase-7172_v2-0.94.patch, hbase-7172_v2.patch
>
>
> TestSplitLogManager.testVanishingTaskZNode fails when run individually (run just that test case from eclipse). I've also noticed that it is flaky on windows. 
> The reason is a rare race condition, which somehow does not happen that much when the whole class is run.
> The sequence of events is smt like this:
>  - we create 1 log file to split
>  - we call splitLogDistributed() in its own thread. 
>  - splitLogDistributed() is waiting in waitForSplittingCompletion() since there are no splitlogworkers, it keep waiting.
>  - we delete the task znode from zk
>  - SplitLogManager receives the zk callback from GetDataAsyncCallback, which will call setDone() and mark the task as success. 
>  - However, meanwhile the waitForSplittingCompletion() loops sees that remainingInZK == 0, and calls return concurrently to the above. 
>  - on return from waitForSplittingCompletion(), splitLogDistributed() fails because the znode delete callback has not completed yet. 
> This race only happens when the last task is deleted from zk, and normally only the SplitLogManager deletes the task znodes after processing it, so I don't think this is a production issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira