You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by schubert zhang <zs...@gmail.com> on 2009/03/12 12:09:58 UTC

Metadata and region mismatch

Hi all,
Today, I encounter a new issue about failure to batchUpdate commit.

I am running a program to insert rows into a HBase table, but after long
time of batchUpdating, following exception occur:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server for region
TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901, row
'13575581009@2008-12-0606:15:48.077', but failed after 10 attempts.
Exceptions:
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
        at
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
        at org.apache.hadoop.hbase.client.HTable.close(HTable.java:1385)
        ......

And after waiting for a long time, I still cannot insert new data.

Then, I check the HBase status, all master and regionservers are running.

But, I find a mismatch about region
"TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901".
In the metadata, I found it said this region is severed by 10.24.1.12, but
when I check into 10.24.1.12, there is no this region.
And then, I stop all HBase cluster and start it. Regions locations are
re-structured and seems everything is OK.

In the log file of 10.24.1.12, I found following exceptions:

836118938_60020/hlog.dat.1236849158178, entries=100010. New log writer:
/hbase/log_10.24.1.12_1236836118938_60020/hlog.dat.1236849168393
2009-03-12 17:12:49,298 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region
TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901 in 48sec
2009-03-12 17:12:49,298 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting split of region
TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901
2009-03-12 17:12:50,648 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236847258901
2009-03-12 17:12:50,809 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299/1762744366
available
2009-03-12 17:12:50,809 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236849169299
2009-03-12 17:12:50,865 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region TESTTABLE,13575590622@2008-12-1615:49:40.143,1236849169299/1344805089
available
2009-03-12 17:12:50,865 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed TESTTABLE,13575590622@2008-12-16 15:49:40.143,1236849169299
2009-03-12 17:29:15,495 WARN org.apache.hadoop.hbase.RegionHistorian: Unable
to 'Region split from: WAPCDR,13575565132@2008-12-0117:16:55.117,1236847258901'
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server for region , row
'TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299', but
failed after 11 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)

org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0

Re: Metadata and region mismatch

Posted by schubert zhang <zs...@gmail.com>.
I find the "ulimit nofile" of one node of my cluster is not enlarged. May my
issue is cause by it. I will retest.
Thank you very much. and thank J-D very much.

Refer to: item 6 of http://wiki.apache.org/hadoop/Hbase/FAQ


On Fri, Mar 13, 2009 at 6:09 PM, schubert zhang <zs...@gmail.com> wrote:

> This time, I have another region missed, and I use close_region
> 'REGIONNAME' to close it. but then all regions after this one missed on the
> web GUI, but I can find them when  scan '.META.':-( notes: This case,
> there is no log infos form -ROOT- table.
>
>
> On Fri, Mar 13, 2009 at 1:10 AM, schubert zhang <zs...@gmail.com> wrote:
>
>> Thank you stack, it seems HBASE-1121.I will continue to track it. Sorry
>> for the log files have been removed.
>>
>>
>> On Fri, Mar 13, 2009 at 12:29 AM, stack <st...@duboce.net> wrote:
>>
>>> Hey Schubert:
>>>
>>> Just FYI, after noticing the mismatch, rather than restart the whole
>>> cluster, you might try closing the single region.  That can jog the
>>> master
>>> into noticing it has a bad assignment.  To do this, in the shell type
>>> 'tools' and you'll see some admin facility.
>>>
>>> The root problem seems to be an issue fixed in the new hbase 0.19.1
>>> release
>>> candidate: See HBASE-1121 'Cluster confused about where -ROOT- is'.
>>>
>>> Worrying is that even after a restart, you cannot get to the troublesome
>>> region.  Is it deployed on a regionserver?  If so, anything pertinent in
>>> the
>>> logs regards this region?
>>>
>>> St.Ack
>>>
>>> On Thu, Mar 12, 2009 at 4:31 AM, schubert zhang <zs...@gmail.com>
>>> wrote:
>>>
>>> > oh, it is not fine.
>>> > Now, I can find:
>>> > TESTTABLE,13575565132@2008-12-01
>>> > 17:16:55.117,1236847258901<
>>> >
>>> http://nd0-rack0-cloud:60010/regionhistorian.jsp?regionname=WAPCDR,13575565132@2008-12-01%2017:16:55.117,1236847258901
>>> > >
>>> > nd1-rack0-cloud:60020 <http://nd1-rack0-cloud:60030/> 916003194
>>> > 13575565132@2008-12-01 17:16:55.117 13576301358@2008-12-0813:57:43.163
>>> >
>>> > but when I try to get get 13575565132@2008-12-01 17:16:55.117, nothing
>>> > returned. It seems this region is gone.
>>> >
>>> >
>>> > On Thu, Mar 12, 2009 at 7:09 PM, schubert zhang <zs...@gmail.com>
>>> wrote:
>>> >
>>> > > Hi all,
>>> > > Today, I encounter a new issue about failure to batchUpdate commit.
>>> > >
>>> > > I am running a program to insert rows into a HBase table, but after
>>> long
>>> > > time of batchUpdating, following exception occur:
>>> > >
>>> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>> > contact
>>> > > region server Some server for region
>>> TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901,
>>> > row '13575581009@2008-12-0606:15:48.077', but failed after 10
>>> attempts.
>>> > > Exceptions:
>>> > >         at
>>> > >
>>> >
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>>> > >         at
>>> > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>>> > >         at
>>> org.apache.hadoop.hbase.client.HTable.close(HTable.java:1385)
>>> > >         ......
>>> > >
>>> > > And after waiting for a long time, I still cannot insert new data.
>>> > >
>>> > > Then, I check the HBase status, all master and regionservers are
>>> running.
>>> > >
>>> > > But, I find a mismatch about region
>>> "TESTTABLE,13575565132@2008-12-0117
>>> > :16:55.117,1236847258901".
>>> > > In the metadata, I found it said this region is severed by
>>> 10.24.1.12,
>>> > but
>>> > > when I check into 10.24.1.12, there is no this region.
>>> > > And then, I stop all HBase cluster and start it. Regions locations
>>> are
>>> > > re-structured and seems everything is OK.
>>> > >
>>> > > In the log file of 10.24.1.12, I found following exceptions:
>>> > >
>>> > > 836118938_60020/hlog.dat.1236849158178, entries=100010. New log
>>> writer:
>>> > > /hbase/log_10.24.1.12_1236836118938_60020/hlog.dat.1236849168393
>>> > > 2009-03-12 17:12:49,298 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion:
>>> > > compaction completed on region TESTTABLE,13575565132@2008-12-0117
>>> :16:55.117,1236847258901
>>> > in 48sec
>>> > > 2009-03-12 17:12:49,298 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion:
>>> > > Starting split of region TESTTABLE,13575565132@2008-12-0117
>>> > :16:55.117,1236847258901
>>> > > 2009-03-12 17:12:50,648 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion:
>>> > > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236847258901
>>> > > 2009-03-12 17:12:50,809 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion:
>>> > > region TESTTABLE,13575565132@2008-12-0117
>>> :16:55.117,1236849169299/1762744366
>>> > available
>>> > > 2009-03-12 17:12:50,809 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion:
>>> > > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236849169299
>>> > > 2009-03-12 17:12:50,865 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion:
>>> > > region TESTTABLE,13575590622@2008-12-1615
>>> :49:40.143,1236849169299/1344805089
>>> > available
>>> > > 2009-03-12 17:12:50,865 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion:
>>> > > Closed TESTTABLE,13575590622@2008-12-16 15:49:40.143,1236849169299
>>> > > 2009-03-12 17:29:15,495 WARN org.apache.hadoop.hbase.RegionHistorian:
>>> > > Unable to 'Region split from: WAPCDR,13575565132@2008-12-0117
>>> > :16:55.117,1236847258901'
>>> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>>> > contact
>>> > > region server Some server for region , row
>>> > 'TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299', but
>>> failed
>>> > after 11 attempts.
>>> > > Exceptions:
>>> > > org.apache.hadoop.hbase.NotServingRegionException:
>>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>>> > >         at
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>>> > >         at
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>> > >         at
>>> > >
>>> >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> > >         at
>>> > >
>>> >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> > >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> > >         at
>>> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>>> > >         at
>>> > >
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>>> > >
>>> > > org.apache.hadoop.hbase.NotServingRegionException:
>>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>>> > >         at
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>>> > >         at
>>> > >
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
>>> > >         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown
>>> Source)
>>> > >         at
>>> > >
>>> >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> > >         at java.lang.reflect.Method.invoke(Method.java:597)
>>> > >         at
>>> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>>> > >         at
>>> > >
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>>> > >
>>> > > org.apache.hadoop.hbase.NotServingRegionException:
>>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>>> > >
>>> >
>>>
>>
>>
>

Re: Metadata and region mismatch

Posted by schubert zhang <zs...@gmail.com>.
This time, I have another region missed, and I use close_region 'REGIONNAME'
to close it. but then all regions after this one missed on the web GUI, but
I can find them when  scan '.META.':-(notes: This case, there is no log
infos form -ROOT- table.


On Fri, Mar 13, 2009 at 1:10 AM, schubert zhang <zs...@gmail.com> wrote:

> Thank you stack, it seems HBASE-1121.I will continue to track it. Sorry
> for the log files have been removed.
>
>
> On Fri, Mar 13, 2009 at 12:29 AM, stack <st...@duboce.net> wrote:
>
>> Hey Schubert:
>>
>> Just FYI, after noticing the mismatch, rather than restart the whole
>> cluster, you might try closing the single region.  That can jog the master
>> into noticing it has a bad assignment.  To do this, in the shell type
>> 'tools' and you'll see some admin facility.
>>
>> The root problem seems to be an issue fixed in the new hbase 0.19.1
>> release
>> candidate: See HBASE-1121 'Cluster confused about where -ROOT- is'.
>>
>> Worrying is that even after a restart, you cannot get to the troublesome
>> region.  Is it deployed on a regionserver?  If so, anything pertinent in
>> the
>> logs regards this region?
>>
>> St.Ack
>>
>> On Thu, Mar 12, 2009 at 4:31 AM, schubert zhang <zs...@gmail.com>
>> wrote:
>>
>> > oh, it is not fine.
>> > Now, I can find:
>> > TESTTABLE,13575565132@2008-12-01
>> > 17:16:55.117,1236847258901<
>> >
>> http://nd0-rack0-cloud:60010/regionhistorian.jsp?regionname=WAPCDR,13575565132@2008-12-01%2017:16:55.117,1236847258901
>> > >
>> > nd1-rack0-cloud:60020 <http://nd1-rack0-cloud:60030/> 916003194
>> > 13575565132@2008-12-01 17:16:55.117 13576301358@2008-12-08 13:57:43.163
>> >
>> > but when I try to get get 13575565132@2008-12-01 17:16:55.117, nothing
>> > returned. It seems this region is gone.
>> >
>> >
>> > On Thu, Mar 12, 2009 at 7:09 PM, schubert zhang <zs...@gmail.com>
>> wrote:
>> >
>> > > Hi all,
>> > > Today, I encounter a new issue about failure to batchUpdate commit.
>> > >
>> > > I am running a program to insert rows into a HBase table, but after
>> long
>> > > time of batchUpdating, following exception occur:
>> > >
>> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> > contact
>> > > region server Some server for region
>> TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901,
>> > row '13575581009@2008-12-0606:15:48.077', but failed after 10 attempts.
>> > > Exceptions:
>> > >         at
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>> > >         at
>> > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>> > >         at
>> org.apache.hadoop.hbase.client.HTable.close(HTable.java:1385)
>> > >         ......
>> > >
>> > > And after waiting for a long time, I still cannot insert new data.
>> > >
>> > > Then, I check the HBase status, all master and regionservers are
>> running.
>> > >
>> > > But, I find a mismatch about region
>> "TESTTABLE,13575565132@2008-12-0117
>> > :16:55.117,1236847258901".
>> > > In the metadata, I found it said this region is severed by 10.24.1.12,
>> > but
>> > > when I check into 10.24.1.12, there is no this region.
>> > > And then, I stop all HBase cluster and start it. Regions locations are
>> > > re-structured and seems everything is OK.
>> > >
>> > > In the log file of 10.24.1.12, I found following exceptions:
>> > >
>> > > 836118938_60020/hlog.dat.1236849158178, entries=100010. New log
>> writer:
>> > > /hbase/log_10.24.1.12_1236836118938_60020/hlog.dat.1236849168393
>> > > 2009-03-12 17:12:49,298 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion:
>> > > compaction completed on region TESTTABLE,13575565132@2008-12-0117
>> :16:55.117,1236847258901
>> > in 48sec
>> > > 2009-03-12 17:12:49,298 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion:
>> > > Starting split of region TESTTABLE,13575565132@2008-12-0117
>> > :16:55.117,1236847258901
>> > > 2009-03-12 17:12:50,648 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion:
>> > > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236847258901
>> > > 2009-03-12 17:12:50,809 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion:
>> > > region TESTTABLE,13575565132@2008-12-0117
>> :16:55.117,1236849169299/1762744366
>> > available
>> > > 2009-03-12 17:12:50,809 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion:
>> > > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236849169299
>> > > 2009-03-12 17:12:50,865 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion:
>> > > region TESTTABLE,13575590622@2008-12-1615
>> :49:40.143,1236849169299/1344805089
>> > available
>> > > 2009-03-12 17:12:50,865 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion:
>> > > Closed TESTTABLE,13575590622@2008-12-16 15:49:40.143,1236849169299
>> > > 2009-03-12 17:29:15,495 WARN org.apache.hadoop.hbase.RegionHistorian:
>> > > Unable to 'Region split from: WAPCDR,13575565132@2008-12-0117
>> > :16:55.117,1236847258901'
>> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> > contact
>> > > region server Some server for region , row
>> > 'TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299', but
>> failed
>> > after 11 attempts.
>> > > Exceptions:
>> > > org.apache.hadoop.hbase.NotServingRegionException:
>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>> > >         at
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > >         at
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >         at
>> > >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > >         at
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >         at java.lang.reflect.Method.invoke(Method.java:597)
>> > >         at
>> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>> > >         at
>> > >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> > >
>> > > org.apache.hadoop.hbase.NotServingRegionException:
>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>> > >         at
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > >         at
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
>> > >         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown
>> Source)
>> > >         at
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >         at java.lang.reflect.Method.invoke(Method.java:597)
>> > >         at
>> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>> > >         at
>> > >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> > >
>> > > org.apache.hadoop.hbase.NotServingRegionException:
>> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>> > >
>> >
>>
>
>

Re: Metadata and region mismatch

Posted by schubert zhang <zs...@gmail.com>.
Thank you stack, it seems HBASE-1121.I will continue to track it. Sorry for
the log files have been removed.


On Fri, Mar 13, 2009 at 12:29 AM, stack <st...@duboce.net> wrote:

> Hey Schubert:
>
> Just FYI, after noticing the mismatch, rather than restart the whole
> cluster, you might try closing the single region.  That can jog the master
> into noticing it has a bad assignment.  To do this, in the shell type
> 'tools' and you'll see some admin facility.
>
> The root problem seems to be an issue fixed in the new hbase 0.19.1 release
> candidate: See HBASE-1121 'Cluster confused about where -ROOT- is'.
>
> Worrying is that even after a restart, you cannot get to the troublesome
> region.  Is it deployed on a regionserver?  If so, anything pertinent in
> the
> logs regards this region?
>
> St.Ack
>
> On Thu, Mar 12, 2009 at 4:31 AM, schubert zhang <zs...@gmail.com> wrote:
>
> > oh, it is not fine.
> > Now, I can find:
> > TESTTABLE,13575565132@2008-12-01
> > 17:16:55.117,1236847258901<
> >
> http://nd0-rack0-cloud:60010/regionhistorian.jsp?regionname=WAPCDR,13575565132@2008-12-01%2017:16:55.117,1236847258901
> > >
> > nd1-rack0-cloud:60020 <http://nd1-rack0-cloud:60030/> 916003194
> > 13575565132@2008-12-01 17:16:55.117 13576301358@2008-12-08 13:57:43.163
> >
> > but when I try to get get 13575565132@2008-12-01 17:16:55.117, nothing
> > returned. It seems this region is gone.
> >
> >
> > On Thu, Mar 12, 2009 at 7:09 PM, schubert zhang <zs...@gmail.com>
> wrote:
> >
> > > Hi all,
> > > Today, I encounter a new issue about failure to batchUpdate commit.
> > >
> > > I am running a program to insert rows into a HBase table, but after
> long
> > > time of batchUpdating, following exception occur:
> > >
> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> > contact
> > > region server Some server for region TESTTABLE,13575565132@2008-12-0117
> :16:55.117,1236847258901,
> > row '13575581009@2008-12-0606:15:48.077', but failed after 10 attempts.
> > > Exceptions:
> > >         at
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
> > >         at
> > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> > >         at
> org.apache.hadoop.hbase.client.HTable.close(HTable.java:1385)
> > >         ......
> > >
> > > And after waiting for a long time, I still cannot insert new data.
> > >
> > > Then, I check the HBase status, all master and regionservers are
> running.
> > >
> > > But, I find a mismatch about region "TESTTABLE,13575565132@2008-12-0117
> > :16:55.117,1236847258901".
> > > In the metadata, I found it said this region is severed by 10.24.1.12,
> > but
> > > when I check into 10.24.1.12, there is no this region.
> > > And then, I stop all HBase cluster and start it. Regions locations are
> > > re-structured and seems everything is OK.
> > >
> > > In the log file of 10.24.1.12, I found following exceptions:
> > >
> > > 836118938_60020/hlog.dat.1236849158178, entries=100010. New log writer:
> > > /hbase/log_10.24.1.12_1236836118938_60020/hlog.dat.1236849168393
> > > 2009-03-12 17:12:49,298 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > compaction completed on region TESTTABLE,13575565132@2008-12-0117
> :16:55.117,1236847258901
> > in 48sec
> > > 2009-03-12 17:12:49,298 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Starting split of region TESTTABLE,13575565132@2008-12-0117
> > :16:55.117,1236847258901
> > > 2009-03-12 17:12:50,648 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236847258901
> > > 2009-03-12 17:12:50,809 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > region TESTTABLE,13575565132@2008-12-0117
> :16:55.117,1236849169299/1762744366
> > available
> > > 2009-03-12 17:12:50,809 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236849169299
> > > 2009-03-12 17:12:50,865 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > region TESTTABLE,13575590622@2008-12-1615
> :49:40.143,1236849169299/1344805089
> > available
> > > 2009-03-12 17:12:50,865 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Closed TESTTABLE,13575590622@2008-12-16 15:49:40.143,1236849169299
> > > 2009-03-12 17:29:15,495 WARN org.apache.hadoop.hbase.RegionHistorian:
> > > Unable to 'Region split from: WAPCDR,13575565132@2008-12-0117
> > :16:55.117,1236847258901'
> > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> > contact
> > > region server Some server for region , row
> > 'TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299', but failed
> > after 11 attempts.
> > > Exceptions:
> > > org.apache.hadoop.hbase.NotServingRegionException:
> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > >         at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > >
> > > org.apache.hadoop.hbase.NotServingRegionException:
> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
> > >         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > >         at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > >
> > > org.apache.hadoop.hbase.NotServingRegionException:
> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> > >
> >
>

Re: Metadata and region mismatch

Posted by stack <st...@duboce.net>.
Hey Schubert:

Just FYI, after noticing the mismatch, rather than restart the whole
cluster, you might try closing the single region.  That can jog the master
into noticing it has a bad assignment.  To do this, in the shell type
'tools' and you'll see some admin facility.

The root problem seems to be an issue fixed in the new hbase 0.19.1 release
candidate: See HBASE-1121 'Cluster confused about where -ROOT- is'.

Worrying is that even after a restart, you cannot get to the troublesome
region.  Is it deployed on a regionserver?  If so, anything pertinent in the
logs regards this region?

St.Ack

On Thu, Mar 12, 2009 at 4:31 AM, schubert zhang <zs...@gmail.com> wrote:

> oh, it is not fine.
> Now, I can find:
> TESTTABLE,13575565132@2008-12-01
> 17:16:55.117,1236847258901<
> http://nd0-rack0-cloud:60010/regionhistorian.jsp?regionname=WAPCDR,13575565132@2008-12-01%2017:16:55.117,1236847258901
> >
> nd1-rack0-cloud:60020 <http://nd1-rack0-cloud:60030/> 916003194
> 13575565132@2008-12-01 17:16:55.117 13576301358@2008-12-08 13:57:43.163
>
> but when I try to get get 13575565132@2008-12-01 17:16:55.117, nothing
> returned. It seems this region is gone.
>
>
> On Thu, Mar 12, 2009 at 7:09 PM, schubert zhang <zs...@gmail.com> wrote:
>
> > Hi all,
> > Today, I encounter a new issue about failure to batchUpdate commit.
> >
> > I am running a program to insert rows into a HBase table, but after long
> > time of batchUpdating, following exception occur:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact
> > region server Some server for region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901,
> row '13575581009@2008-12-0606:15:48.077', but failed after 10 attempts.
> > Exceptions:
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
> >         at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
> >         at org.apache.hadoop.hbase.client.HTable.close(HTable.java:1385)
> >         ......
> >
> > And after waiting for a long time, I still cannot insert new data.
> >
> > Then, I check the HBase status, all master and regionservers are running.
> >
> > But, I find a mismatch about region "TESTTABLE,13575565132@2008-12-0117
> :16:55.117,1236847258901".
> > In the metadata, I found it said this region is severed by 10.24.1.12,
> but
> > when I check into 10.24.1.12, there is no this region.
> > And then, I stop all HBase cluster and start it. Regions locations are
> > re-structured and seems everything is OK.
> >
> > In the log file of 10.24.1.12, I found following exceptions:
> >
> > 836118938_60020/hlog.dat.1236849158178, entries=100010. New log writer:
> > /hbase/log_10.24.1.12_1236836118938_60020/hlog.dat.1236849168393
> > 2009-03-12 17:12:49,298 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > compaction completed on region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901
> in 48sec
> > 2009-03-12 17:12:49,298 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Starting split of region TESTTABLE,13575565132@2008-12-0117
> :16:55.117,1236847258901
> > 2009-03-12 17:12:50,648 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236847258901
> > 2009-03-12 17:12:50,809 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299/1762744366
> available
> > 2009-03-12 17:12:50,809 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236849169299
> > 2009-03-12 17:12:50,865 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > region TESTTABLE,13575590622@2008-12-1615:49:40.143,1236849169299/1344805089
> available
> > 2009-03-12 17:12:50,865 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed TESTTABLE,13575590622@2008-12-16 15:49:40.143,1236849169299
> > 2009-03-12 17:29:15,495 WARN org.apache.hadoop.hbase.RegionHistorian:
> > Unable to 'Region split from: WAPCDR,13575565132@2008-12-0117
> :16:55.117,1236847258901'
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact
> > region server Some server for region , row
> 'TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299', but failed
> after 11 attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
> >         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> >
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
> >
>

Re: Metadata and region mismatch

Posted by schubert zhang <zs...@gmail.com>.
oh, it is not fine.
Now, I can find:
TESTTABLE,13575565132@2008-12-01
17:16:55.117,1236847258901<http://nd0-rack0-cloud:60010/regionhistorian.jsp?regionname=WAPCDR,13575565132@2008-12-01%2017:16:55.117,1236847258901>
nd1-rack0-cloud:60020 <http://nd1-rack0-cloud:60030/> 916003194
13575565132@2008-12-01 17:16:55.117 13576301358@2008-12-08 13:57:43.163

but when I try to get get 13575565132@2008-12-01 17:16:55.117, nothing
returned. It seems this region is gone.


On Thu, Mar 12, 2009 at 7:09 PM, schubert zhang <zs...@gmail.com> wrote:

> Hi all,
> Today, I encounter a new issue about failure to batchUpdate commit.
>
> I am running a program to insert rows into a HBase table, but after long
> time of batchUpdating, following exception occur:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server for region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901, row '13575581009@2008-12-0606:15:48.077', but failed after 10 attempts.
> Exceptions:
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:942)
>         at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
>         at org.apache.hadoop.hbase.client.HTable.close(HTable.java:1385)
>         ......
>
> And after waiting for a long time, I still cannot insert new data.
>
> Then, I check the HBase status, all master and regionservers are running.
>
> But, I find a mismatch about region "TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901".
> In the metadata, I found it said this region is severed by 10.24.1.12, but
> when I check into 10.24.1.12, there is no this region.
> And then, I stop all HBase cluster and start it. Regions locations are
> re-structured and seems everything is OK.
>
> In the log file of 10.24.1.12, I found following exceptions:
>
> 836118938_60020/hlog.dat.1236849158178, entries=100010. New log writer:
> /hbase/log_10.24.1.12_1236836118938_60020/hlog.dat.1236849168393
> 2009-03-12 17:12:49,298 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901 in 48sec
> 2009-03-12 17:12:49,298 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting split of region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236847258901
> 2009-03-12 17:12:50,648 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236847258901
> 2009-03-12 17:12:50,809 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299/1762744366 available
> 2009-03-12 17:12:50,809 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed TESTTABLE,13575565132@2008-12-01 17:16:55.117,1236849169299
> 2009-03-12 17:12:50,865 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> region TESTTABLE,13575590622@2008-12-1615:49:40.143,1236849169299/1344805089 available
> 2009-03-12 17:12:50,865 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed TESTTABLE,13575590622@2008-12-16 15:49:40.143,1236849169299
> 2009-03-12 17:29:15,495 WARN org.apache.hadoop.hbase.RegionHistorian:
> Unable to 'Region split from: WAPCDR,13575565132@2008-12-0117:16:55.117,1236847258901'
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server for region , row 'TESTTABLE,13575565132@2008-12-0117:16:55.117,1236849169299', but failed after 11 attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1546)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0
>