You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2015/04/04 21:40:33 UTC

[jira] [Comment Edited] (HBASE-13391) TestRegionObserverInterface frequently failing on branch-1

    [ https://issues.apache.org/jira/browse/HBASE-13391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395898#comment-14395898 ] 

Andrew Purtell edited comment on HBASE-13391 at 4/4/15 7:39 PM:
----------------------------------------------------------------

I was finally able to reproduce a failure once by introducing some external IO and CPU activity. Attached are logs of TestRegionObserverInterface#testLegacyRecovery for a passing case and a failing case. 

One thing I see is when on line 683 of TestRegionObserverInterface we say "All regions assigned", in the failing case there is still SplitWorker activity ongoing. Replay ops haven't finished yet when we check for WAL related CP method invocations? I thought I'd see if disabling distributed replay would change the behavior of the test:

{code}
diff --git a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/Test
RegionObserverInterface.java b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
index 5bd8b19..ba028dc 100644
--- a/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
+++ b/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
@@ -39,6 +39,7 @@ import org.apache.hadoop.hbase.CellUtil;
 import org.apache.hadoop.hbase.Coprocessor;
 import org.apache.hadoop.hbase.HBaseTestingUtility;
 import org.apache.hadoop.hbase.HColumnDescriptor;
+import org.apache.hadoop.hbase.HConstants;
 import org.apache.hadoop.hbase.HRegionInfo;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.KeyValue;
@@ -101,6 +102,7 @@ public class TestRegionObserverInterface {
     conf.setStrings(CoprocessorHost.REGION_COPROCESSOR_CONF_KEY,
         "org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver",
         "org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy");
+    conf.setBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, false);
 
     util.startMiniCluster();
     cluster = util.getMiniHBaseCluster();
{code}

but that causes a different sort of failure:

{noformat}
java.lang.AssertionError: Result of org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore is expected to be 1, while we get 3
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:753)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:687)
{noformat}

Any thoughts on what might be going on here [~busbey]? 


was (Author: apurtell):
I was finally able to reproduce a failure once by introducing some external IO and CPU activity. Attached are logs of TestRegionObserverInterface#testLegacyRecovery for a passing case and a failing case. 

One thing I see is when on line 683 of TestRegionObserverInterface we say "All regions assigned", in the failing case there is still SplitWorker activity ongoing. Replay ops haven't finished yet when we check for WAL related CP method invocations? I thought I'd see if disabling distributed replay would change the behavior of the test but that causes a different sort of failure:

{noformat}
java.lang.AssertionError: Result of org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore is expected to be 1, while we get 3
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:753)
	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:687)
{noformat}

Any thoughts on what might be going on here [~busbey]? 

> TestRegionObserverInterface frequently failing on branch-1 
> -----------------------------------------------------------
>
>                 Key: HBASE-13391
>                 URL: https://issues.apache.org/jira/browse/HBASE-13391
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: test.log.fail.txt, test.log.pass.txt
>
>
> TestRegionObserverInterface is frequently failing on branch-1 .
> Example:
> {noformat}
> java.lang.AssertionError: Result of org.apache.hadoop.hbase.coprocessor.SimpleRegionObserver$Legacy.getCtPreWALRestore is expected to be 1, while we get 0
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.verifyMethodResult(TestRegionObserverInterface.java:751)
> 	at org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface.testLegacyRecovery(TestRegionObserverInterface.java:685)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)