You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rong-en Fan <gr...@gmail.com> on 2008/04/18 12:29:22 UTC

too busy host causes NotServingRegion exception?

I'm running hbase and hadoop-0.17 trunk code as of earlier today (without
HBASE-10). While loading 50m records into a table with ~800,000 rows with only
one column family. This is a 3 node DFS and 3 region servers. I load
the data from one of these three boxes. Once awhilte, I got NotServingRegion
exception, the code looks like

BatchUpdate bu = new BatchUpdate(row)
bu.put(...)
table.commit(bu)

When I examine region server's log, it shows something like:

08/04/18 01:51:14 open the region in question
08/04/18 01:51:15 region available
08/04/18 01:51:15 starting compaction
08/04/18 01:51:22 region closed
08/04/18 01:51:41 NotServingRegion Exception
08/04/18 01:51:47 compaction done
08/04/18 01:51:51 NotServingRegion Exception
08/04/18 01:52:01 NotServingRegion Exception
08/04/18 01:52:11 NotServingRegion Exception
08/04/18 01:52:21 NotServingRegion Exception
08/04/18 01:52:47 open the region in question
08/04/18 01:52:47 region avilable

the master log somehow got truncated, IIRC, the master tried to assign the
region to this region server some where between 01:51:22 and 01:51:41.

>From my understanding, this region server is a little busy so it does not
accept the assignment from the master. I'm wondering if this is caused by
too busy regionsserver (the request per sec on each region server is about
1000), and if so, what configuration variables should I tune with?
In addition, what would be the best practices when writing client by
java to deal with such exception (as NotServingRegion should be common
on a very busy HBase instance, I think).

BTW, I was getting lots of different strange failures when doing the same
thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase trunk,
I only get the error above. It seems there are no more mysterious exceptions :-D

Thanks,
Rong-En Fan

Re: too busy host causes NotServingRegion exception?

Posted by Rong-en Fan <gr...@gmail.com>.
On Sat, Apr 19, 2008 at 12:14 AM, Bryan Duxbury <br...@rapleaf.com> wrote:
> NotServingRegionExceptions are normal when they appear in the regionserver
> logs. They're not normal when they come out of your client code. You get an
> NSRE when a region gets split or reassigned and the client's cache of the
> region's location is out of date. Normally, the HTable client retries a
> bunch, and eventually it gets sorted out. However, if the
> reassignment/splitting/etc takes longer than all the retries, the client
> will get the NSRE. In general we'd like for those not to happen, but I'm not
> sure that there's actually something wrong.
>
>  When you say once in a while, how frequent are you talking about?

Well, first occurs after one hour of writing and second one occurs few minutes
later. However, after I sent the mail, it has no problems at all for
the next couple
hours of writing.

Regards,
Rong-En Fan

>  If you want to tune this problem away, you can edit your hbase-site.xml and
> change hbase.client.retries to be a bigger number and/or hbase.client.pause
> to be longer. That might resolve your issue. If something is actually broken
> in HBase, more retries won't help, and that would be an interesting fact to
> know. If it is just a timing/load issue, then more retries or a longer pause
> will probably fix it. This would also be a really interesting fact to know
> :).
>
>  Glad to hear that trunk erases some of the mystery of 0.16!
>
>  -Bryan
>
>
>
>  On Apr 18, 2008, at 3:29 AM, Rong-en Fan wrote:
>
>
> > I'm running hbase and hadoop-0.17 trunk code as of earlier today (without
> > HBASE-10). While loading 50m records into a table with ~800,000 rows with
> only
> > one column family. This is a 3 node DFS and 3 region servers. I load
> > the data from one of these three boxes. Once awhilte, I got
> NotServingRegion
> > exception, the code looks like
> >
> > BatchUpdate bu = new BatchUpdate(row)
> > bu.put(...)
> > table.commit(bu)
> >
> > When I examine region server's log, it shows something like:
> >
> > 08/04/18 01:51:14 open the region in question
> > 08/04/18 01:51:15 region available
> > 08/04/18 01:51:15 starting compaction
> > 08/04/18 01:51:22 region closed
> > 08/04/18 01:51:41 NotServingRegion Exception
> > 08/04/18 01:51:47 compaction done
> > 08/04/18 01:51:51 NotServingRegion Exception
> > 08/04/18 01:52:01 NotServingRegion Exception
> > 08/04/18 01:52:11 NotServingRegion Exception
> > 08/04/18 01:52:21 NotServingRegion Exception
> > 08/04/18 01:52:47 open the region in question
> > 08/04/18 01:52:47 region avilable
> >
> > the master log somehow got truncated, IIRC, the master tried to assign the
> > region to this region server some where between 01:51:22 and 01:51:41.
> >
> > From my understanding, this region server is a little busy so it does not
> > accept the assignment from the master. I'm wondering if this is caused by
> > too busy regionsserver (the request per sec on each region server is about
> > 1000), and if so, what configuration variables should I tune with?
> > In addition, what would be the best practices when writing client by
> > java to deal with such exception (as NotServingRegion should be common
> > on a very busy HBase instance, I think).
> >
> > BTW, I was getting lots of different strange failures when doing the same
> > thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase trunk,
> > I only get the error above. It seems there are no more mysterious
> exceptions :-D
> >
> > Thanks,
> > Rong-En Fan
> >
>
>

Re: too busy host causes NotServingRegion exception?

Posted by Bryan Duxbury <br...@rapleaf.com>.
NotServingRegionExceptions are normal when they appear in the  
regionserver logs. They're not normal when they come out of your  
client code. You get an NSRE when a region gets split or reassigned  
and the client's cache of the region's location is out of date.  
Normally, the HTable client retries a bunch, and eventually it gets  
sorted out. However, if the reassignment/splitting/etc takes longer  
than all the retries, the client will get the NSRE. In general we'd  
like for those not to happen, but I'm not sure that there's actually  
something wrong.

When you say once in a while, how frequent are you talking about?

If you want to tune this problem away, you can edit your hbase- 
site.xml and change hbase.client.retries to be a bigger number and/or  
hbase.client.pause to be longer. That might resolve your issue. If  
something is actually broken in HBase, more retries won't help, and  
that would be an interesting fact to know. If it is just a timing/ 
load issue, then more retries or a longer pause will probably fix it.  
This would also be a really interesting fact to know :).

Glad to hear that trunk erases some of the mystery of 0.16!

-Bryan

On Apr 18, 2008, at 3:29 AM, Rong-en Fan wrote:

> I'm running hbase and hadoop-0.17 trunk code as of earlier today  
> (without
> HBASE-10). While loading 50m records into a table with ~800,000  
> rows with only
> one column family. This is a 3 node DFS and 3 region servers. I load
> the data from one of these three boxes. Once awhilte, I got  
> NotServingRegion
> exception, the code looks like
>
> BatchUpdate bu = new BatchUpdate(row)
> bu.put(...)
> table.commit(bu)
>
> When I examine region server's log, it shows something like:
>
> 08/04/18 01:51:14 open the region in question
> 08/04/18 01:51:15 region available
> 08/04/18 01:51:15 starting compaction
> 08/04/18 01:51:22 region closed
> 08/04/18 01:51:41 NotServingRegion Exception
> 08/04/18 01:51:47 compaction done
> 08/04/18 01:51:51 NotServingRegion Exception
> 08/04/18 01:52:01 NotServingRegion Exception
> 08/04/18 01:52:11 NotServingRegion Exception
> 08/04/18 01:52:21 NotServingRegion Exception
> 08/04/18 01:52:47 open the region in question
> 08/04/18 01:52:47 region avilable
>
> the master log somehow got truncated, IIRC, the master tried to  
> assign the
> region to this region server some where between 01:51:22 and 01:51:41.
>
> From my understanding, this region server is a little busy so it  
> does not
> accept the assignment from the master. I'm wondering if this is  
> caused by
> too busy regionsserver (the request per sec on each region server  
> is about
> 1000), and if so, what configuration variables should I tune with?
> In addition, what would be the best practices when writing client by
> java to deal with such exception (as NotServingRegion should be common
> on a very busy HBase instance, I think).
>
> BTW, I was getting lots of different strange failures when doing  
> the same
> thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase  
> trunk,
> I only get the error above. It seems there are no more mysterious  
> exceptions :-D
>
> Thanks,
> Rong-En Fan