You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2008/10/14 20:57:44 UTC
[jira] Created: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
We don't recover if HRS hosting -ROOT-/.META. goes down
-------------------------------------------------------
Key: HBASE-927
URL: https://issues.apache.org/jira/browse/HBASE-927
Project: Hadoop HBase
Issue Type: Bug
Reporter: stack
Priority: Blocker
Fix For: 0.19.0
To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
{code}
...
2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
java.io.IOException: Call failed on local exception
at org.apache.hadoop.ipc.Client.call(Client.java:718)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
at $Proxy2.openScanner(Unknown Source)
at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
at org.apache.hadoop.ipc.Client.call(Client.java:704)
... 7 more
2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
...
{code}
Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman resolved HBASE-927.
---------------------------------
Resolution: Fixed
Committed.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652655#action_12652655 ]
Andrew Purtell commented on HBASE-927:
--------------------------------------
Yes this has happened to me using 0.18.1.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman reassigned HBASE-927:
-----------------------------------
Assignee: Jim Kellerman
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman resolved HBASE-927.
---------------------------------
Resolution: Fixed
Fixed, tested, committed.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman reopened HBASE-927:
---------------------------------
Backport for 0.18.2
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640445#action_12640445 ]
Jim Kellerman commented on HBASE-927:
-------------------------------------
This is really tricky. I have managed to prevent the root region from being assigned to multiple servers, but still
have to work out how to prevent meta regions from being assigned to multiple servers.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648722#action_12648722 ]
stack commented on HBASE-927:
-----------------------------
Scatch my comment above. This needs fixing for 0.19.0. Just happened to jgray.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman reopened HBASE-927:
---------------------------------
Still broken. Reopening.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649564#action_12649564 ]
stack commented on HBASE-927:
-----------------------------
Did your last commit double the traffic to the master adding a get of root region every time the regionserver does its heartbeat?
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648672#action_12648672 ]
stack commented on HBASE-927:
-----------------------------
Should we move this out of 0.19.0? Will it be easier and a better fix when ZK is in mix?
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652864#action_12652864 ]
stack commented on HBASE-927:
-----------------------------
Jim is trying to fix the breakage.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
> Attachments: hbase-927.patch
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652638#action_12652638 ]
Jonathan Gray commented on HBASE-927:
-------------------------------------
Thanks Jim! I think this issue might exist in 0.18, Andrew can you confirm that? If so, I think this alone is worth an 0.18.2 release (though there are some other things related to OOME, etc that are always worth backporting)
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-927:
------------------------
Attachment: hbase-927.patch
Here is what was applied -- svn diff -r722690:722704 > hbase-927.patch. Going to back it out.
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
> Attachments: hbase-927.patch
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman resolved HBASE-927.
---------------------------------
Resolution: Fixed
Fix Version/s: 0.18.2
Fixed trunk. Back-ported to 0.18 branch for 0.18.2
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0, 0.18.2
>
> Attachments: hbase-927.patch
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-927) We don't recover if HRS hosting
-ROOT-/.META. goes down
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652856#action_12652856 ]
stack commented on HBASE-927:
-----------------------------
This patch looks to have broken trunk. Hudson fails. If I revert my trunk to r722704, the version before hbase-1042, build fails in tests like org.apache.hadoop.hbase.TestGlobalMemcacheLimit with below:
{code}
2008-12-03 08:57:01,369 DEBUG [HMaster] master.HMaster(421): Main processing loop: PendingOpenOperation from 127.0.0.1:39337
2008-12-03 08:57:01,371 INFO [HMaster] master.ProcessRegionOpen$1(71): .META.,,1 open on 127.0.0.1:39337
2008-12-03 08:57:01,372 INFO [HMaster] master.ProcessRegionOpen$1(82): updating row .META.,,1 in region -ROOT-,,0 with startcode 1228323417464 and server 127.0.0.1:39337
2008-12-03 08:57:03,185 DEBUG [main] client.HConnectionManager$TableServers(792): Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', FAMILIES =>
{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS => '10', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2008-12-03 08:57:03,225 ERROR [main] hbase.HBaseClusterTestCase(130): Exception in setup!
org.apache.hadoop.hbase.master.NotAllMetaRegionsOnlineException: org.apache.hadoop.hbase.master.NotAllMetaRegionsOnlineException
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:596)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:195)
at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:153)
at org.apache.hadoop.hbase.TestGlobalMemcacheLimit.postHBaseClusterSetup(TestGlobalMemcacheLimit.java:70)
at org.apache.hadoop.hbase.HBaseClusterTestCase.setUp(HBaseClusterTestCase.java:128)
at junit.framework.TestCase.runBare(TestCase.java:125)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:421)
at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:912)
at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:766)
2008-12-03 08:57:03,226 DEBUG [main] hbase.LocalHBaseCluster(254): Shutting down HBase Cluster
{code}
> We don't recover if HRS hosting -ROOT-/.META. goes down
> -------------------------------------------------------
>
> Key: HBASE-927
> URL: https://issues.apache.org/jira/browse/HBASE-927
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.19.0
>
>
> To replicate, set up a cluster with a master and a regionserver. Start up the the cluster. Kill the regionserver. Master just does this over and over:
> {code}
> ...
> 2008-10-14 18:54:14,737 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> 2008-10-14 18:54:15,739 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 0 time(s).
> 2008-10-14 18:54:16,742 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 1 time(s).
> 2008-10-14 18:54:17,744 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 2 time(s).
> 2008-10-14 18:54:18,747 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 3 time(s).
> 2008-10-14 18:54:19,749 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 4 time(s).
> 2008-10-14 18:54:20,752 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 5 time(s).
> 2008-10-14 18:54:21,755 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: XX.XX.XX.XX:60020. Already tried 6 time(s).
> 2008-10-14 18:54:22,757 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 7 time(s).
> 2008-10-14 18:54:23,759 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 8 time(s).
> 2008-10-14 18:54:24,762 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:XX.XX.XX.XX:60020. Already tried 9 time(s).
> 2008-10-14 18:54:24,763 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {regionname: .META.,,1, startKey: <>, server: XX.XX.XX.XX:60020}
> java.io.IOException: Call failed on local exception
> at org.apache.hadoop.ipc.Client.call(Client.java:718)
> at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:245)
> at $Proxy2.openScanner(Unknown Source)
> at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
> at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
> at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
> at org.apache.hadoop.ipc.Client.call(Client.java:704)
> ... 7 more
> 2008-10-14 18:54:24,766 INFO org.apache.hadoop.hbase.master.BaseScanner: all meta regions scanned
> ...
> {code}
> Made it a blocker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.