You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2008/10/14 20:57:44 UTC

[jira] Created: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

We don't recover if HRS hosting -ROOT-/.META. goes down
-------------------------------------------------------

                 Key: HBASE-927
                 URL: https://issues.apache.org/jira/browse/HBASE-927
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack
            Priority: Blocker
             Fix For: 0.19.0


To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:

{code}
...
2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
java.io.IOException: Call failed on local exception
        at org.apache.hadoop.ipc.Client.call(Client.java:718)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
        at $Proxy2.openScanner(Unknown Source)
        at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
        at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
        at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
        at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
        at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
        at org.apache.hadoop.ipc.Client.call(Client.java:704)
        ... 7 more
2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
...

{code}

Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HBASE-927.
---------------------------------

    Resolution: Fixed

Committed.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652655#action_12652655 ] 

Andrew Purtell commented on HBASE-927:
--------------------------------------

Yes this has happened to me using 0.18.1.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HBASE-927:
-----------------------------------

    Assignee: Jim Kellerman

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HBASE-927.
---------------------------------

    Resolution: Fixed

Fixed, tested, committed.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reopened HBASE-927:
---------------------------------


Backport for 0.18.2

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640445#action_12640445 ] 

Jim Kellerman commented on HBASE-927:
-------------------------------------

This is really tricky. I have managed to prevent the root region from being assigned to multiple servers, but still 
have to work out how to prevent meta regions from being assigned to multiple servers.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648722#action_12648722 ] 

stack commented on HBASE-927:
-----------------------------

Scatch my comment above.  This needs fixing for 0.19.0.  Just happened to jgray.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reopened HBASE-927:
---------------------------------


Still broken. Reopening.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649564#action_12649564 ] 

stack commented on HBASE-927:
-----------------------------

Did your last commit double the traffic to the master adding a get of root region every time the regionserver does its heartbeat?

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648672#action_12648672 ] 

stack commented on HBASE-927:
-----------------------------

Should we move this out of 0.19.0?  Will it be easier and a better fix when ZK is in mix?

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652864#action_12652864 ] 

stack commented on HBASE-927:
-----------------------------

Jim is trying to fix the breakage.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hbase-927.patch
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652638#action_12652638 ] 

Jonathan Gray commented on HBASE-927:
-------------------------------------

Thanks Jim!  I think this issue might exist in 0.18, Andrew can you confirm that?  If so, I think this alone is worth an 0.18.2 release (though there are some other things related to OOME, etc that are always worth backporting)

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-927:
------------------------

    Attachment: hbase-927.patch

Here is what was applied --  svn diff -r722690:722704 > hbase-927.patch.  Going to back it out.

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: hbase-927.patch
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HBASE-927.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.18.2

Fixed trunk. Back-ported to 0.18 branch for 0.18.2

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0, 0.18.2
>
>         Attachments: hbase-927.patch
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-927) We don't recover if HRS hosting -ROOT-/.META. goes down

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652856#action_12652856 ] 

stack commented on HBASE-927:
-----------------------------

This patch looks to have broken trunk.  Hudson fails.  If I revert my trunk to r722704, the version before hbase-1042, build fails in tests like org.apache.hadoop.hbase.TestGlobalMemcacheLimit  with below:

{code}
2008-12-03 08:57:01,369 DEBUG [HMaster] master.HMaster(421): Main processing loop: PendingOpenOperation from 127.0.0.1:39337
2008-12-03 08:57:01,371 INFO  [HMaster] master.ProcessRegionOpen$1(71): .META.,,1 open on 127.0.0.1:39337
2008-12-03 08:57:01,372 INFO  [HMaster] master.ProcessRegionOpen$1(82): updating row .META.,,1 in region -ROOT-,,0 with startcode 1228323417464 and server 127.0.0.1:39337
2008-12-03 08:57:03,185 DEBUG [main] client.HConnectionManager$TableServers(792): Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', FAMILIES => 
{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS => '10', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2008-12-03 08:57:03,225 ERROR [main] hbase.HBaseClusterTestCase(130): Exception in setup!
org.apache.hadoop.hbase.master.NotAllMetaRegionsOnlineException: org.apache.hadoop.hbase.master.NotAllMetaRegionsOnlineException
	at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:596)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
	at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:195)
	at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:153)
	at org.apache.hadoop.hbase.TestGlobalMemcacheLimit.postHBaseClusterSetup(TestGlobalMemcacheLimit.java:70)
	at org.apache.hadoop.hbase.HBaseClusterTestCase.setUp(HBaseClusterTestCase.java:128)
	at junit.framework.TestCase.runBare(TestCase.java:125)
	at junit.framework.TestResult$1.protect(TestResult.java:106)
	at junit.framework.TestResult.runProtected(TestResult.java:124)
	at junit.framework.TestResult.run(TestResult.java:109)
	at junit.framework.TestCase.run(TestCase.java:118)
	at junit.framework.TestSuite.runTest(TestSuite.java:208)
	at junit.framework.TestSuite.run(TestSuite.java:203)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
2008-12-03 08:57:03,226 DEBUG [main] hbase.LocalHBaseCluster(254): Shutting down HBase Cluster
{code}

> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
>                 Key: HBASE-927
>                 URL: https://issues.apache.org/jira/browse/HBASE-927
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver.  Start up the the cluster.  Kill the regionserver.  Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
>         at org.apache.hadoop.ipc.Client.call(Client.java:718)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
>         at $Proxy2.openScanner(Unknown Source)
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>         at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
>         at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
>         at org.apache.hadoop.ipc.Client.call(Client.java:704)
>         ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.