You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Brent Miller <br...@gmail.com> on 2011/08/05 23:13:58 UTC

Client still attempting to connect to failed regionserver

I've been evaluating HBase for an upcoming project, and must say I'm quite
impressed with the preference.

I've been using a test client to simulate the load that we're expecting.
This morning we had one of the regionservers die and we're finding that the
test application is still trying to reconnect to the failed regionserver,
even after restarting the application. (hadoop-3 is the failed server)

When the client starts up, we see the following exception:

11/08/05 13:21:34 ERROR [GENTEST7] test.Main$DataGen: Caught exception while
inserting data
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
5578 actions: servers with issues: hadoop-3.ionamerica.priv:60020,
 at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1227)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1241)
 at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:826)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:682)
 at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667)
at test.Main$DataGen.run(Main.java:196)
 at java.lang.Thread.run(Thread.java:679)

And the master's log is *filled* with:

2011-08-05 13:30:56,349 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received
exception accessing META during server shutdown of
hadoop-3.ionamerica.priv,60020,1312306642172, retrying META read

I was under the assumption that if a regionserver failed, the clients would
automatically switch over to a good regionserver. Also, if I pull up the
mater's web UI, it no longer shows the failed regionserver in the "Region
Servers" section. Is this a bug or does the client have to somehow check if
a regionserver is valid?

We're using Clouder'a HBase 0.90.3-cdh3u1 on Ubuntu 10.04

This seems similar to
http://mail-archives.apache.org/mod_mbox/hbase-user/201106.mbox/%3C7B3A9A088A1B88488CBD26C63C1581D40377C7C3@ex-01%3E
but
there doesn't seem to be any resolution there.

Thanks,
Brent

Re: Client still attempting to connect to failed regionserver

Posted by Ted Yu <yu...@gmail.com>.
I think this is HBASE-4168.
I just put patched 0.90.4 onto our staging cluster.

On Tue, Aug 9, 2011 at 10:03 AM, Brent Miller <br...@gmail.com>wrote:

> Thanks for the reply.
>
> I think the exception that you're asking about is this one:
>
> 2011-08-05 10:57:34,529 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received
> exception accessing META during server shutdown of
> hadoop-3.ionamerica.priv,60020,1312306642172, retrying META read
> 2011-08-05 10:57:37,538 WARN
> org.apache.hadoop.hbase.zookeeper.MetaNodeTracker: Tried to reset META
> server location after seeing the completion of a new META assignment but
> got
> an IOE
> java.net.NoRouteToHostException: No route to host
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at
>
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
> at
>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
> at
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> at $Proxy6.getRegionInfo(Unknown Source)
> at
>
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
> at
>
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:272)
> at
>
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
> at
>
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364)
> at
>
> org.apache.hadoop.hbase.zookeeper.MetaNodeTracker.nodeDeleted(MetaNodeTracker.java:64)
> at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:276)
> at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>
> If it helps at all, I put a copy master log up at
>
> https://s3-us-west-1.amazonaws.com/brent-be-public/hbase-hbase-master-hadoop-master.log.2011-08-05-partial
> which
> contains the time frame from when the master first noticed the region
> server
> was dead until it started spitting out "Received exception accessing META
> during server shutdown..." over and over again.
>
> Thanks,
> Brent
>
>
> On Mon, Aug 8, 2011 at 4:14 PM, Stack <st...@duboce.net> wrote:
>
> > On Fri, Aug 5, 2011 at 2:13 PM, Brent Miller <br...@gmail.com>
> > wrote:
> > > I was under the assumption that if a regionserver failed, the clients
> > would
> > > automatically switch over to a good regionserver. Also, if I pull up
> the
> > > mater's web UI, it no longer shows the failed regionserver in the
> "Region
> > > Servers" section. Is this a bug or does the client have to somehow
> check
> > if
> > > a regionserver is valid?
> > >
> > > We're using Clouder'a HBase 0.90.3-cdh3u1 on Ubuntu 10.04
> > >
> >
> > What usually happens is that when a regionserver dies, the master will
> > notice its absence and then it will deploy the regions the dead server
> > was carrying elsewhere.  The process that does this is named
> > ServerShutdownHandler.  In your case above, it seems that this handler
> > is having an issue processing the dead server -- so the regions did
> > not get reassigned.   What is the exception that is being thrown when
> > we try to contact .META. region?
> >
> > St.Ack
> >
>

Re: Client still attempting to connect to failed regionserver

Posted by Brent Miller <br...@gmail.com>.
Thanks for the reply.

I think the exception that you're asking about is this one:

2011-08-05 10:57:34,529 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received
exception accessing META during server shutdown of
hadoop-3.ionamerica.priv,60020,1312306642172, retrying META read
2011-08-05 10:57:37,538 WARN
org.apache.hadoop.hbase.zookeeper.MetaNodeTracker: Tried to reset META
server location after seeing the completion of a new META assignment but got
an IOE
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.getRegionInfo(Unknown Source)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:272)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364)
at
org.apache.hadoop.hbase.zookeeper.MetaNodeTracker.nodeDeleted(MetaNodeTracker.java:64)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:276)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)

If it helps at all, I put a copy master log up at
https://s3-us-west-1.amazonaws.com/brent-be-public/hbase-hbase-master-hadoop-master.log.2011-08-05-partial
which
contains the time frame from when the master first noticed the region server
was dead until it started spitting out "Received exception accessing META
during server shutdown..." over and over again.

Thanks,
Brent


On Mon, Aug 8, 2011 at 4:14 PM, Stack <st...@duboce.net> wrote:

> On Fri, Aug 5, 2011 at 2:13 PM, Brent Miller <br...@gmail.com>
> wrote:
> > I was under the assumption that if a regionserver failed, the clients
> would
> > automatically switch over to a good regionserver. Also, if I pull up the
> > mater's web UI, it no longer shows the failed regionserver in the "Region
> > Servers" section. Is this a bug or does the client have to somehow check
> if
> > a regionserver is valid?
> >
> > We're using Clouder'a HBase 0.90.3-cdh3u1 on Ubuntu 10.04
> >
>
> What usually happens is that when a regionserver dies, the master will
> notice its absence and then it will deploy the regions the dead server
> was carrying elsewhere.  The process that does this is named
> ServerShutdownHandler.  In your case above, it seems that this handler
> is having an issue processing the dead server -- so the regions did
> not get reassigned.   What is the exception that is being thrown when
> we try to contact .META. region?
>
> St.Ack
>

Re: Client still attempting to connect to failed regionserver

Posted by Stack <st...@duboce.net>.
On Fri, Aug 5, 2011 at 2:13 PM, Brent Miller <br...@gmail.com> wrote:
> I was under the assumption that if a regionserver failed, the clients would
> automatically switch over to a good regionserver. Also, if I pull up the
> mater's web UI, it no longer shows the failed regionserver in the "Region
> Servers" section. Is this a bug or does the client have to somehow check if
> a regionserver is valid?
>
> We're using Clouder'a HBase 0.90.3-cdh3u1 on Ubuntu 10.04
>

What usually happens is that when a regionserver dies, the master will
notice its absence and then it will deploy the regions the dead server
was carrying elsewhere.  The process that does this is named
ServerShutdownHandler.  In your case above, it seems that this handler
is having an issue processing the dead server -- so the regions did
not get reassigned.   What is the exception that is being thrown when
we try to contact .META. region?

St.Ack