You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jason Chuong <ja...@cbsinteractive.com> on 2011/07/15 14:16:35 UTC

hbase crash after restart

Hi All,

I have a 5-node cluster setup with 3 nodes as a part of zookeeper quorum.
When i restart the hbase master, the server try to connect to an unknown
host and then crash.
Anyone seen this error message before or know how to resolve this thanks

2011-07-15 05:10:49,158 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
connecting to server: 10.16.129.21/10.16.129.21:50712
2011-07-15 05:11:10,162 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
connecting to server: 10.16.129.21/10.16.129.21:50712
2011-07-15 05:11:31,166 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
connecting to server: 10.16.129.21/10.16.129.21:50712
2011-07-15 05:11:31,170 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.net.SocketTimeoutException: 20000 millis timeout while waiting for
channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=
10.16.129.21/10.16.129.21:50712]
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.get

Re: hbase crash after restart

Posted by Jason Chuong <ja...@cbsinteractive.com>.
that fix the problem bill,  thanks for the help

i when and remove the ip then restart the zookeeper port and everything came
up.

ie
zk: hadoop-wkr1:2181(CONNECTED) 4] get /hbase/root-region-server
10.16.129.21:50712
cZxid = 77309767158
ctime = Thu Jul 14 11:23:28 PDT 2011


On Fri, Jul 15, 2011 at 11:21 AM, Bill Graham <bi...@gmail.com> wrote:

> What do you see when you do this from the ZK client:
>
> get /hbase/root-region-server
>
> I suspect a client somewhere registered itself in ZK. Maybe fixing the IP
> of
> the root region server in ZK will do the trick.
>
>
> On Fri, Jul 15, 2011 at 10:58 AM, Jason Chuong <
> jason.chuong@cbsinteractive.com> wrote:
>
> > Hi Dave,
> >
> > Yes we are and on hbase version 0.90,  I've also verify that the
> zookeeper
> > are responding via the zk shell and logs look normal.
> > Just don't understand why it's trying to connect to that ip address.
> >
> >
> > [zk: hadoop-wkr-r1:2181(CONNECTED) 1] ls /hbase
> > [splitlog, unassigned, rs, root-region-server, table, shutdown]
> >
> >
> >
> >
> > On Fri, Jul 15, 2011 at 9:54 AM, Buttler, David <bu...@llnl.gov>
> wrote:
> >
> > > You really don't need 3 zookeeper nodes for a 5 node cluster. 1 is
> > > sufficient.
> > > Are you managing zookeeper with hbase or independently?
> > >
> > > Dave
> > >
> > >
> > > -----Original Message-----
> > > From: Jason Chuong [mailto:jason.chuong@cbsinteractive.com]
> > > Sent: Friday, July 15, 2011 5:17 AM
> > > To: user@hbase.apache.org
> > > Subject: hbase crash after restart
> > >
> > > Hi All,
> > >
> > > I have a 5-node cluster setup with 3 nodes as a part of zookeeper
> quorum.
> > > When i restart the hbase master, the server try to connect to an
> unknown
> > > host and then crash.
> > > Anyone seen this error message before or know how to resolve this
> thanks
> > >
> > > 2011-07-15 05:10:49,158 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> > > connecting to server: 10.16.129.21/10.16.129.21:50712
> > > 2011-07-15 05:11:10,162 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> > > connecting to server: 10.16.129.21/10.16.129.21:50712
> > > 2011-07-15 05:11:31,166 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> > > connecting to server: 10.16.129.21/10.16.129.21:50712
> > > 2011-07-15 05:11:31,170 FATAL org.apache.hadoop.hbase.master.HMaster:
> > > Unhandled exception. Starting shutdown.
> > > java.net.SocketTimeoutException: 20000 millis timeout while waiting for
> > > channel to be ready for connect. ch :
> > > java.nio.channels.SocketChannel[connection-pending remote=
> > > 10.16.129.21/10.16.129.21:50712]
> > > at
> > >
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> > > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> > > at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> > > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> > > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> > > at $Proxy6.getProtocolVersion(Unknown Source)
> > > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> > > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> > > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> > > at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.get
> > >
> >
>

Re: hbase crash after restart

Posted by Bill Graham <bi...@gmail.com>.
What do you see when you do this from the ZK client:

get /hbase/root-region-server

I suspect a client somewhere registered itself in ZK. Maybe fixing the IP of
the root region server in ZK will do the trick.


On Fri, Jul 15, 2011 at 10:58 AM, Jason Chuong <
jason.chuong@cbsinteractive.com> wrote:

> Hi Dave,
>
> Yes we are and on hbase version 0.90,  I've also verify that the zookeeper
> are responding via the zk shell and logs look normal.
> Just don't understand why it's trying to connect to that ip address.
>
>
> [zk: hadoop-wkr-r1:2181(CONNECTED) 1] ls /hbase
> [splitlog, unassigned, rs, root-region-server, table, shutdown]
>
>
>
>
> On Fri, Jul 15, 2011 at 9:54 AM, Buttler, David <bu...@llnl.gov> wrote:
>
> > You really don't need 3 zookeeper nodes for a 5 node cluster. 1 is
> > sufficient.
> > Are you managing zookeeper with hbase or independently?
> >
> > Dave
> >
> >
> > -----Original Message-----
> > From: Jason Chuong [mailto:jason.chuong@cbsinteractive.com]
> > Sent: Friday, July 15, 2011 5:17 AM
> > To: user@hbase.apache.org
> > Subject: hbase crash after restart
> >
> > Hi All,
> >
> > I have a 5-node cluster setup with 3 nodes as a part of zookeeper quorum.
> > When i restart the hbase master, the server try to connect to an unknown
> > host and then crash.
> > Anyone seen this error message before or know how to resolve this thanks
> >
> > 2011-07-15 05:10:49,158 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> > connecting to server: 10.16.129.21/10.16.129.21:50712
> > 2011-07-15 05:11:10,162 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> > connecting to server: 10.16.129.21/10.16.129.21:50712
> > 2011-07-15 05:11:31,166 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> > connecting to server: 10.16.129.21/10.16.129.21:50712
> > 2011-07-15 05:11:31,170 FATAL org.apache.hadoop.hbase.master.HMaster:
> > Unhandled exception. Starting shutdown.
> > java.net.SocketTimeoutException: 20000 millis timeout while waiting for
> > channel to be ready for connect. ch :
> > java.nio.channels.SocketChannel[connection-pending remote=
> > 10.16.129.21/10.16.129.21:50712]
> > at
> >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> > at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> > at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> > at $Proxy6.getProtocolVersion(Unknown Source)
> > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> > at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> > at
> >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.get
> >
>

Re: hbase crash after restart

Posted by Jason Chuong <ja...@cbsinteractive.com>.
Hi Dave,

Yes we are and on hbase version 0.90,  I've also verify that the zookeeper
are responding via the zk shell and logs look normal.
Just don't understand why it's trying to connect to that ip address.


[zk: hadoop-wkr-r1:2181(CONNECTED) 1] ls /hbase
[splitlog, unassigned, rs, root-region-server, table, shutdown]




On Fri, Jul 15, 2011 at 9:54 AM, Buttler, David <bu...@llnl.gov> wrote:

> You really don't need 3 zookeeper nodes for a 5 node cluster. 1 is
> sufficient.
> Are you managing zookeeper with hbase or independently?
>
> Dave
>
>
> -----Original Message-----
> From: Jason Chuong [mailto:jason.chuong@cbsinteractive.com]
> Sent: Friday, July 15, 2011 5:17 AM
> To: user@hbase.apache.org
> Subject: hbase crash after restart
>
> Hi All,
>
> I have a 5-node cluster setup with 3 nodes as a part of zookeeper quorum.
> When i restart the hbase master, the server try to connect to an unknown
> host and then crash.
> Anyone seen this error message before or know how to resolve this thanks
>
> 2011-07-15 05:10:49,158 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> connecting to server: 10.16.129.21/10.16.129.21:50712
> 2011-07-15 05:11:10,162 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> connecting to server: 10.16.129.21/10.16.129.21:50712
> 2011-07-15 05:11:31,166 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
> connecting to server: 10.16.129.21/10.16.129.21:50712
> 2011-07-15 05:11:31,170 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for
> channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=
> 10.16.129.21/10.16.129.21:50712]
> at
>
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
> at
>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
> at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> at $Proxy6.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.get
>

RE: hbase crash after restart

Posted by "Buttler, David" <bu...@llnl.gov>.
You really don't need 3 zookeeper nodes for a 5 node cluster. 1 is sufficient.  
Are you managing zookeeper with hbase or independently?

Dave


-----Original Message-----
From: Jason Chuong [mailto:jason.chuong@cbsinteractive.com] 
Sent: Friday, July 15, 2011 5:17 AM
To: user@hbase.apache.org
Subject: hbase crash after restart

Hi All,

I have a 5-node cluster setup with 3 nodes as a part of zookeeper quorum.
When i restart the hbase master, the server try to connect to an unknown
host and then crash.
Anyone seen this error message before or know how to resolve this thanks

2011-07-15 05:10:49,158 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
connecting to server: 10.16.129.21/10.16.129.21:50712
2011-07-15 05:11:10,162 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
connecting to server: 10.16.129.21/10.16.129.21:50712
2011-07-15 05:11:31,166 INFO org.apache.hadoop.ipc.HbaseRPC: Problem
connecting to server: 10.16.129.21/10.16.129.21:50712
2011-07-15 05:11:31,170 FATAL org.apache.hadoop.hbase.master.HMaster:
Unhandled exception. Starting shutdown.
java.net.SocketTimeoutException: 20000 millis timeout while waiting for
channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=
10.16.129.21/10.16.129.21:50712]
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.get