You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Adam Phelps <am...@opendns.com> on 2010/10/19 03:02:45 UTC

NotServingRegionException question

On our test cluster I did a reset of the hdfs and hbase services (using 
stop/start-dfs and stop/start-hase) earlier today, and from everything I 
know how to check HDFS and HBase have come up correctly.

However some of the tables now appear to be inaccessible, with attempts 
to scan or load getting errors such as these:

org.apache.hadoop.hbase.NotServingRegionException: 
org.apache.hadoop.hbase.NotServingRegionException: XXX
         at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2192)
         at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1786)
         at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
         at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:919)

Some tables still appear to work, but most are hitting this error.  As 
far as I can tell all HDFS data is accessible, and all nodes have active 
regionserver processes, so I'm not sure what to look for to correct 
these errors.  Any suggestions would be appreciated.

- Adam Phelps

Re: NotServingRegionException question

Posted by Adam Phelps <am...@opendns.com>.
On 10/19/10 9:22 PM, Stack wrote:
> You are aware that NSRE can happen as part of 'normal' operation when
> say a region is splitting?  It'll be offline for a short period during
> which time clients looking for the parent being split will be told go
> away and recalibrate; NSRE is the way the server kicks the client to
> go do new lookup.

I didn't know that, but I'm pretty sure that wasn't the case here.  The 
problem was occurring for several hours before I posted and it was 
several more hours after that before it came back online.

> Throw other questions up here on the list Adam so we make sure we've
> the issues covered when we cut the 0.90.0RC.

Will do.

Thanks
- Adam Phelps

Re: NotServingRegionException question

Posted by Stack <st...@duboce.net>.
On Tue, Oct 19, 2010 at 10:03 AM, Adam Phelps <am...@opendns.com> wrote:
> I'm running v0.89.20100621.  I ran major_compact against .META. and all the
> tables and after a couple hours they seem to work again.
>

Hmm.


> For your questions I was running with 20-odd servers and something like 2000
> regions.  The only other thing to note was that there was probably an HDFS
> node decommissioning process running when this occurred, could that have
> contributed to this?

No. HBase should ride over this fine.


 I'm trying to understand the causes behind problems
> such as this to better avoid them once we move into production.
>

Well, it'd have been nice to figure what was going on w/ the regions.
Here's a few things to try for next time:

To 'prove' a table all online, I'll run a rowcounter mapreduce job or,
you could try running the hbck tool (./bin/hbase hbck).  The latter is
probably a little wobbly in your version of hbase but should tell you
basic health.

For the unexpected NSRE exception, I'd try to figure which region its
complaining of.  Usually the region name is included in the message
(and the server name that is saying it doesn't have the wanted
region).  I'd take the region name and grep master logs to try and
figure where it actually is being served (or not).   If master is
saying it assigned, go to the server it thinks it assigned too --
check its logs.

You are aware that NSRE can happen as part of 'normal' operation when
say a region is splitting?  It'll be offline for a short period during
which time clients looking for the parent being split will be told go
away and recalibrate; NSRE is the way the server kicks the client to
go do new lookup.

Throw other questions up here on the list Adam so we make sure we've
the issues covered when we cut the 0.90.0RC.

St.Ack

Re: NotServingRegionException question

Posted by Adam Phelps <am...@opendns.com>.
I'm running v0.89.20100621.  I ran major_compact against .META. and all 
the tables and after a couple hours they seem to work again.

For your questions I was running with 20-odd servers and something like 
2000 regions.  The only other thing to note was that there was probably 
an HDFS node decommissioning process running when this occurred, could 
that have contributed to this?  I'm trying to understand the causes 
behind problems such as this to better avoid them once we move into 
production.

Thanks
- Adam Phelps

On 10/18/10 9:05 PM, Stack wrote:
> What version of hbase?  Try a restart.  Seems like something happened
> on startup and .META. has old locations for regions.  Check the master
> log during startup sequence.  Do you have many regions?  Many servers?
>
> Thanks,
> St.Ack
>
> On Mon, Oct 18, 2010 at 6:02 PM, Adam Phelps<am...@opendns.com>  wrote:
>> On our test cluster I did a reset of the hdfs and hbase services (using
>> stop/start-dfs and stop/start-hase) earlier today, and from everything I
>> know how to check HDFS and HBase have come up correctly.
>>
>> However some of the tables now appear to be inaccessible, with attempts to
>> scan or load getting errors such as these:
>>
>> org.apache.hadoop.hbase.NotServingRegionException:
>> org.apache.hadoop.hbase.NotServingRegionException: XXX
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2192)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1786)
>>         at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>>         at
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:919)
>>
>> Some tables still appear to work, but most are hitting this error.  As far
>> as I can tell all HDFS data is accessible, and all nodes have active
>> regionserver processes, so I'm not sure what to look for to correct these
>> errors.  Any suggestions would be appreciated.
>>
>> - Adam Phelps
>>
>


Re: NotServingRegionException question

Posted by Stack <st...@duboce.net>.
What version of hbase?  Try a restart.  Seems like something happened
on startup and .META. has old locations for regions.  Check the master
log during startup sequence.  Do you have many regions?  Many servers?

Thanks,
St.Ack

On Mon, Oct 18, 2010 at 6:02 PM, Adam Phelps <am...@opendns.com> wrote:
> On our test cluster I did a reset of the hdfs and hbase services (using
> stop/start-dfs and stop/start-hase) earlier today, and from everything I
> know how to check HDFS and HBase have come up correctly.
>
> However some of the tables now appear to be inaccessible, with attempts to
> scan or load getting errors such as these:
>
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: XXX
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2192)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1786)
>        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:919)
>
> Some tables still appear to work, but most are hitting this error.  As far
> as I can tell all HDFS data is accessible, and all nodes have active
> regionserver processes, so I'm not sure what to look for to correct these
> errors.  Any suggestions would be appreciated.
>
> - Adam Phelps
>