You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2016/02/01 18:25:39 UTC

[jira] [Commented] (HBASE-15192) TestRegionMergeTransactionOnCluster#testCleanMergeReference is flaky

    [ https://issues.apache.org/jira/browse/HBASE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126599#comment-15126599 ] 

Ted Yu commented on HBASE-15192:
--------------------------------

Since the test fails if merge references are not cleaned, we can call admin.runCatalogScan() more than once if needed.

runCatalogScan() is the only method exposed by CatalogJanitor, otherwise we can poll CatalogJanitor for the value of mergeCleaned and pass the test when mergeCleaned crosses 1.

Patch v2 passes 30 iterations of test runs. Previously the test failed within the first 5 iterations.

> TestRegionMergeTransactionOnCluster#testCleanMergeReference is flaky
> --------------------------------------------------------------------
>
>                 Key: HBASE-15192
>                 URL: https://issues.apache.org/jira/browse/HBASE-15192
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: HBASE-15192.v1.patch
>
>
> TestRegionMergeTransactionOnCluster#testCleanMergeReference fails intermittently due to failed assertion on cleaned merge region count:
> {code}
> testCleanMergeReference(org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster)  Time elapsed: 64.183 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster.testCleanMergeReference(TestRegionMergeTransactionOnCluster.java:284)
> {code}
> Before calling CatalogJanitor#scan(), the test does:
> {code}
>       int newcount1 = 0;
>       while (System.currentTimeMillis() < timeout) {
>         for(HColumnDescriptor colFamily : columnFamilies) {
>           newcount1 += hrfs.getStoreFiles(colFamily.getName()).size();
>         }
>         if(newcount1 <= 1) {
>           break;
>         }
>         Thread.sleep(50);
>       }
> {code}
> newcount1 is not cleared at the beginning of the loop.
> This means that if the check for newcount1 <= 1 doesn't pass the first iteration, it wouldn't pass in subsequent iterations.
> After timeout is exhausted, admin.runCatalogScan() is called. However, there is a chance that CatalogJanitor#scan() has been called by the Chore already (during the wait period), leaving the cleaned count 0 and failing the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)