You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2010/08/19 18:37:59 UTC

Re: bug report: opening hbase region takes too long , making the region not available for more than 10 minutes.

Jonathan:
We saw similar issue using HBASE 0.20.6 with HBASE-2473

Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: No
server address listed in .META. for region
HB_INC_POST_0818-ERROR_SAMPLES-1282193650093,,1282193650831
   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:726)
   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
   at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
   at
org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:244)
   at
org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:206)
   at
com.carrieriq.m2m.platform.mmp2.input.StripedHBaseTable.createIfNeeded(StripedHBaseTable.java:470)
   ... 11 more

I assume the region has to be open otherwise locateRegionInMeta() call would
fail

After restarting HBase, I see:

2010-08-19 05:08:00,565 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
HB_INC_POST_0818-ERROR_SAMPLES-1282193650093,,1282193650831

Cheers

On Mon, Jun 14, 2010 at 4:25 PM, Jonathan Gray <jg...@facebook.com> wrote:

> Can you post the log from the regionserver that did not ever open the
> region (from 12:57 to 13:14)?  And actually grab it from a couple minutes
> before 12:57.
>
> Most likely this is not a bug as much as a current limitation of handling
> open/close messages sequentially.  It's possible that a long-running close
> (flush) held up processing of the open.  The logs will say more.
>
> This should be much improved with the major release of HBase.
>
> JG
>
> > -----Original Message-----
> > From: Jinsong Hu [mailto:jinsong_hu@hotmail.com]
> > Sent: Monday, June 14, 2010 11:24 AM
> > To: user@hbase.apache.org
> > Subject: bug report: opening hbase region takes too long , making the
> > region not available for more than 10 minutes.
> >
> >
> >
> > Hi, There:
> >
> >    I have found an hbase bug related to openning region takes too long.
> > The
> > client reported error of no server address.  For the region
> > MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773,  here is
> > the
> > sequence:
> >
> >
> >
> > Around  12:57, all 8 region servers closed this region.
> > On machine2037,  at 12:57:45,812 , it received a request to open this
> > region.  Usually, a worker thread will immediately honor the request
> > and
> > open this region within seconds, but in this case, the region wasn't
> > open
> > until 13:14:43,341 .
> > Around 13:16, all other regionservers received requests to open this
> > region
> > , and worker thread immediately opened them .
> >
> >
> > So during this time time gap from 12:57 to 13:14, the region is not
> > available. And the client logs error while trying to insert the
> > records.
> >
> >
> >
> > I have read the hbase code. The way the hbase solves this problem is by
> > retrying 10 times, waiting 10 seconds in between. Essentially it tries
> > for
> > 100 seconds.
> >
> > In this case, even that 100 seconds retrial won't work at 12:10.
> > because the
> > region was opened way beyond 100 second interval.
> >
> >
> >
> > This is clearly an hbase bug.
> >
> >
> > Jimmy>
> >
> >
> >
> >
> > Here is the client side log:
> >
> > 13:10:03,441 INFO  [ClientCnxn] Attempting connection to server
> > zookeeper2.cloud.mydomain.net/10.110.8 52:2181: No server address
> > listed in
> > .META. for region MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> > 13:10:03,451 INFO  [ClientCnxn] Server connection successful
> >
> > org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> > address
> > listed in .META. for r gion MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> >
> >
> > here are the regionserver side log related to this issue.
> >
> >
> > machine2035:
> >
> > 2010-06-14 12:57:23,452 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > Close
> >
> > d MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> >
> > 6457581773
> >
> > 2010-06-14 13:16:37,333 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044b
> >
> > d1db885f1523,1276457581773
> >
> > 2010-06-14 13:16:37,333 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a
> >
> > 3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> >
> >
> > machine2036:
> >
> > 2010-06-14 12:57:29,312 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > Close
> >
> > d MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> >
> > 6457581773
> >
> > 2010-06-14 13:16:05,107 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044b
> >
> > d1db885f1523,1276457581773
> >
> > 2010-06-14 13:16:05,107 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a
> >
> > 3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> >
> >
> >
> >
> > machine2037
> >
> > 2010-06-14 12:57:09,986 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > Close
> >
> > d MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> >
> > 6457581773
> >
> > 2010-06-14 12:57:45,812 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044b
> >
> > d1db885f1523,1276457581773
> >
> > 2010-06-14 13:14:43,341 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a
> >
> > 3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> >
> >
> >
> >
> > machine2038
> >
> >
> >
> > 2010-06-14 12:57:25,562 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > Close
> >
> > d MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> >
> > 6457581773
> >
> > 2010-06-14 13:15:53,356 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044b
> >
> > d1db885f1523,1276457581773
> >
> > 2010-06-14 13:15:53,356 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a
> >
> > 3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> >
> >
> > machine2040:
> >
> > 2010-06-14 12:57:14,214 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > Close
> >
> > d MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> >
> > 6457581773
> >
> > 2010-06-14 13:15:01,266 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044b
> >
> > d1db885f1523,1276457581773
> >
> > 2010-06-14 13:15:01,266 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a
> >
> > 3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> >
> >
> >
> >
> > machine2041
> >
> > 2010-06-14 12:57:44,877 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > Close
> >
> > d MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> >
> > 6457581773
> >
> > 2010-06-14 13:15:48,955 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044b
> >
> > d1db885f1523,1276457581773
> >
> > 2010-06-14 13:15:48,955 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a
> >
> > 3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> > machine2042:
> >
> > 2010-06-14 12:57:12,500 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > Close
> >
> > d MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044bd1db885f1523,127
> >
> > 6457581773
> >
> > 2010-06-14 13:14:58,719 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a3c3c044b
> >
> > d1db885f1523,1276457581773
> >
> > 2010-06-14 13:14:58,719 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >
> >  Worker: MSG_REGION_OPEN: MyOwnEventTable,2010-06-13
> > 10:33:31\x0922f3563bd43a
> >
> > 3c3c044bd1db885f1523,1276457581773
> >
> >
> >
> >
> >
>
>