You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2010/11/10 03:06:23 UTC

[jira] Created: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
---------------------------------------------------------------------

                 Key: HBASE-3214
                 URL: https://issues.apache.org/jira/browse/HBASE-3214
             Project: HBase
          Issue Type: Bug
          Components: test
    Affects Versions: 0.90.0
            Reporter: Jonathan Gray
             Fix For: 0.90.0


Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930441#action_12930441 ] 

Jonathan Gray commented on HBASE-3214:
--------------------------------------

This is what I see locally as the failure:

{noformat}
2010-11-09 18:07:17,291 FATAL [Master:0;172.24.154.154:56721] master.HMaster(888): Unhandled exception. Starting shutdown.
java.lang.RuntimeException: Failed exists test on hdfs://localhost:56643/user/jgray/.logs
	at org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:162)
	at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:374)
	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:272)
	at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: Filesystem closed
	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:232)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
	at org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:158)
	... 3 more
{noformat}

Somehow DFS is being closed.  No idea why this is happening now but wasn't before.

Shortly before this exception, I see this log line:
{noformat}
2010-11-09 18:07:12,413 INFO  [Shutdown of DFS[DFSClient[clientName=DFSClient_hb_rs_172.24.154.154,56663,1289354815892_1289354817252, ugi=jgray.hfs.1,supergroup]]] hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(248): Hook closing fs=DFS[DFSClient[clientName=DFSClient_hb_rs_172.24.154.154,56663,1289354815892_1289354817252, ugi=jgray.hfs.1,supergroup]]
{noformat}

Looks like an RS exiting is now triggering a complete shutdown of DFS.

If I comment out the below line in MiniHBaseCluster line 189, the test passes.
{noformat}
      this.shutdownThread = new SingleFileSystemShutdownThread(getFileSystem());
{noformat}

What has changed?  In unit tests, if RS is being shut down, should not take the entire FS with it?

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930445#action_12930445 ] 

Jonathan Gray commented on HBASE-3214:
--------------------------------------

Looks like this was broken with commit of HBASE-3194 (hbase should run on secure + vanilla hadoop)

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-3214:
---------------------------------

    Attachment:     (was: HBASE-3214.patch)

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930796#action_12930796 ] 

Gary Helmling commented on HBASE-3214:
--------------------------------------

Looks like TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is still failing up on hudson with this patch.  But now at least it's an assertion failure instead of a timeout:

{noformat}
java.lang.AssertionError: 
	at org.junit.Assert.fail(Assert.java:91)
	at org.junit.Assert.assertTrue(Assert.java:43)
	at org.junit.Assert.assertFalse(Assert.java:68)
	at org.junit.Assert.assertFalse(Assert.java:79)
	at org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS(TestMasterFailover.java:855)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
{noformat}

This is down in the check that regions are offline:
{code}
    // Everything that should be offline should not be online
    for (HRegionInfo hri : regionsThatShouldBeOffline) {
      assertFalse(onlineRegions.contains(hri));
    }
{code}

The strange thing is that not only does this pass for me locally, it also passes in our internal hudson build of trunk.

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Gary Helmling
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930455#action_12930455 ] 

Gary Helmling commented on HBASE-3214:
--------------------------------------

Looking into it.  Possibly something not accounted for in the changes to MiniHBaseCluster.

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930678#action_12930678 ] 

Jonathan Gray commented on HBASE-3214:
--------------------------------------

Thanks for the patch, Gary.  I just committed to trunk.  Let's make sure Hudson passes before closing this out.

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Gary Helmling
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930482#action_12930482 ] 

Gary Helmling commented on HBASE-3214:
--------------------------------------

Okay, this is the result of an interaction between instance caching in FileSystem.get() based on Configuration contents, and the change to MiniHBaseCluster.MiniHBaseClusterRegionServer to not call HBaseTestingUtility.setDifferentUser() in the constructor.  As a result, the MiniHBaseCluster.conf instance is being modified with the new hadoop.job.ugi entry when each RS is started, and it is this configuration which is picked up when a new master is started at TestMasterFailover, line 801.  Since the contents of the configuration are the same, it picks up the FileSystem instance that was closed by the shutdown hook in the "dead" RS.

I'll post a fix as soon as I've run it through the full test suite locally.

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray reassigned HBASE-3214:
------------------------------------

    Assignee: Gary Helmling

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Gary Helmling
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930446#action_12930446 ] 

Jonathan Gray commented on HBASE-3214:
--------------------------------------

Also, TestRollingRestart is timing out on hudson but passes locally.  If it times out seems like we can't access the output logs on hudson?

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930682#action_12930682 ] 

Jonathan Gray commented on HBASE-3214:
--------------------------------------

btw, tests are passing locally w/ this patch.

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Gary Helmling
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930450#action_12930450 ] 

Jonathan Gray commented on HBASE-3214:
--------------------------------------

Well, I kind of lied.  Looks like I'm still getting loads of "Filesystem closed" exceptions in TestRollingRestart even when it passes but they are doing shutdown handling of RSs and we don't abort anything we just can't do log replay (which is not necessary in this test).

So definitely something fishy with commit of HBASE-3194 and unit tests / aborting RS bringing down DFS underneath running unit test cluster / master.

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930447#action_12930447 ] 

Jonathan Gray commented on HBASE-3214:
--------------------------------------

And I'm seeing loads of this "Shutdown of DFS" in my passing TestRollingRestart, so doesn't seem like it always breaks tests but for some reason it's now causing breakage in TestMasterFailover?

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-3214:
---------------------------------

    Attachment: HBASE-3214.patch

Here's a patch that fixes TestMasterFailover for me.  This may also explain the  "FileSystem closed" errors showing up in TestRollingRestart, but that's hard to confirm if it's just failing up in hudson.  It does fix the "FileSystem closed" error that was aborting the master restart in TestMasterFailover.

With this fix, all tests pass locally for me.  But while testing, I did see intermittent failures of 
{noformat}
org.apache.hadoop.hbase.replication.TestReplication.queueFailover()
{noformat}

It's not clear to me that this is the same issue though, and the failures weren't consistent.  So I don't think we should hold up this fix for it.



> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray resolved HBASE-3214.
----------------------------------

    Resolution: Fixed

Passed on latest hudson build.  Will open new jiras if anything new crops up.

> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Gary Helmling
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3214) TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing

Posted by "Gary Helmling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-3214:
---------------------------------

    Attachment: HBASE-3214.patch

Same attachment as previously, but selecting the correct "Attachment license" option this time.  Sorry for the noise.


> TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS is failing
> ---------------------------------------------------------------------
>
>                 Key: HBASE-3214
>                 URL: https://issues.apache.org/jira/browse/HBASE-3214
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>         Attachments: HBASE-3214.patch
>
>
> Failing on hudson and locally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.