You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Haijun Cao <ha...@ymail.com> on 2009/07/19 22:24:09 UTC
NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Hi
I am experiencing the NSRE exception (however, not all NSRE is created equal, so it seems) while scanning TestTable, TestTable is previously populated with sequentialWrite 100x1M records (using PerformanceEvaluation map reduce).
I checked the region in exception and found that the region is not served because region sever is complaining about duplicate assignment:
MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
I checked the .META. for the region, it indeed has two
assignment records.
I am wondering if this is a bug? How I can recover the region from this? (I searched archieve using duplicate assignment, got no result).
I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env has
3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode xreciver=4096, handler=50, ulimit 32768 (followed hbase-0.20.0-alpha overview_description religiously)
Thanks in advance.
Haijun
1. Exception while scanning:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.10.30.106:60020 for region TestTable,0089182778,1247979707102, row '0089182778', but failed after 10 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0089182778,1247979707102
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
2. duplicate assignments for the region in .META.
Timestamp
Event
Description
Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server snv-it-lin-012,60020,1247965643087
Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server snv-it-lin-012,60020,1247965643087
Sat, 18 Jul 2009 22:04:49 split Region split from:TestTable,0089182778,1247904130413
3. Region server log file:
[haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102 /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
2009-07-18 22:04:54,014 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
2009-07-18 22:04:54,015 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
2009-07-18 22:04:57,085 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
2009-07-18 22:05:00,077 INFO
org.apache.hadoop.hbase.regionserver.HRegion: region
TestTable,0089182778,1247979707102/1884010304 available; sequence id is 57144455
2009-07-18 22:05:00,100 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
2009-07-18 22:05:03,242 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
2009-07-18 22:05:03,242 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
2009-07-18 22:05:03,243 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed TestTable,0089182778,1247979707102
Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Posted by stack <st...@duboce.net>.
On Sun, Jul 19, 2009 at 2:18 PM, Haijun Cao <ha...@ymail.com> wrote:
>
> Now I just need to find out all the bad regions and fix it this way......
> really hoping for a hbase fsck command.
If too many, until hbasck command, might have to restart to fix.
To find bad regions, try scanning for a column in a column family that you
know doesn't exist in shell ("scan 'TABLENAME', {COLUMNS =>
'NON_EXISTENT_COLUMN'}"). Make sure DEBUG is enabled on client before you
begin. With DEBUG, you'll see the region its trying to load before it does
so you can identify the troublesome ones.
> Back to the original cause (region got closed due to duplicate assignment
> to the same region server), is it a bug? Shall I open a ticket for it?
>
May already be one. If you send on the master log, can send it me private,
I can figure if new condition (If not DEBUG, probably of little use. Please
also name the region doubly-assigned).
Thanks Haijun.
St.Ack
>
> Thanks.
>
> Haijun
>
>
> ________________________________
> From: Ryan Rawson <ry...@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Sunday, July 19, 2009 1:29:37 PM
> Subject: Re: NSRE due to duplicate assignment
> (MSG_REGION_CLOSE_WITHOUT_REPORT)
>
> A quick recover is to kill your master with 'kill' (not
> hbase-daemon.sh). Then restart it.
>
> If that doesn't work, you might have to manually delete the
> regionserver assignment in meta:
> deleteall '.META.', 'TestTable,0089182778,1247979707102', 'info:server'
>
> The master will reassign the region within 60 seconds.
>
> Let us know!
> -ryan
>
> On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao<ha...@ymail.com> wrote:
> >
> >
> >
> >
> > Hi
> >
> >
> > I am experiencing the NSRE exception (however, not all NSRE is created
> equal, so it seems) while scanning TestTable, TestTable is previously
> populated with sequentialWrite 100x1M records (using PerformanceEvaluation
> map reduce).
> >
> > I checked the region in exception and found that the region is not served
> because region sever is complaining about duplicate assignment:
> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> >
> > I checked the .META. for the region, it indeed has two
> > assignment records.
> >
> > I am wondering if this is a bug? How I can recover the region from this?
> (I searched archieve using duplicate assignment, got no result).
> >
> > I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env
> has
> > 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode
> xreciver=4096, handler=50, ulimit 32768 (followed hbase-0.20.0-alpha
> overview_description religiously)
> >
> >
> > Thanks in advance.
> >
> > Haijun
> >
> >
> >
> > 1. Exception while scanning:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server 10.10.30.106:60020 for region
> TestTable,0089182778,1247979707102, row '0089182778', but failed after 10
> attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> TestTable,0089182778,1247979707102
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
> > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
> > at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
> >
> > 2. duplicate assignments for the region in .META.
> >
> > Timestamp
> > Event
> > Description
> > Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
> >
> > Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> > Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> > Sat, 18 Jul 2009 22:04:49 split Region split
> from:TestTable,0089182778,1247904130413
> >
> > 3. Region server log file:
> >
> > [haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102
> /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
> > 2009-07-18 22:04:54,014 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:04:54,015 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:04:57,085 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:05:00,077 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion: region
> > TestTable,0089182778,1247979707102/1884010304 available; sequence id is
> 57144455
> > 2009-07-18 22:05:00,100 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> > 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> > 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> > 2009-07-18 22:05:03,243 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Closed
> TestTable,0089182778,1247979707102
> >
> >
> >
>
>
>
>
>
Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Posted by Haijun Cao <ha...@ymail.com>.
Ryan,
Thank you for your advice.
I tried both approach, restart master does not work, delete info:server from .META. works.
One minor thing: I used deleteall command first, turned out this deleted the region (including info:regioninfo) completely. The region is lost. Luckily, I have another region with the same problem, I tried with delete command (not deleteall), it worked as you described, the region is reassigned and opened successfully with a region server. I can get rows within the region.
Now I just need to find out all the bad regions and fix it this way...... really hoping for a hbase fsck command.
Back to the original cause (region got closed due to duplicate assignment to the same region server), is it a bug? Shall I open a ticket for it?
Thanks.
Haijun
________________________________
From: Ryan Rawson <ry...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Sunday, July 19, 2009 1:29:37 PM
Subject: Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
A quick recover is to kill your master with 'kill' (not
hbase-daemon.sh). Then restart it.
If that doesn't work, you might have to manually delete the
regionserver assignment in meta:
deleteall '.META.', 'TestTable,0089182778,1247979707102', 'info:server'
The master will reassign the region within 60 seconds.
Let us know!
-ryan
On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao<ha...@ymail.com> wrote:
>
>
>
>
> Hi
>
>
> I am experiencing the NSRE exception (however, not all NSRE is created equal, so it seems) while scanning TestTable, TestTable is previously populated with sequentialWrite 100x1M records (using PerformanceEvaluation map reduce).
>
> I checked the region in exception and found that the region is not served because region sever is complaining about duplicate assignment:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
>
> I checked the .META. for the region, it indeed has two
> assignment records.
>
> I am wondering if this is a bug? How I can recover the region from this? (I searched archieve using duplicate assignment, got no result).
>
> I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env has
> 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode xreciver=4096, handler=50, ulimit 32768 (followed hbase-0.20.0-alpha overview_description religiously)
>
>
> Thanks in advance.
>
> Haijun
>
>
>
> 1. Exception while scanning:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.10.30.106:60020 for region TestTable,0089182778,1247979707102, row '0089182778', but failed after 10 attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0089182778,1247979707102
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
>
> 2. duplicate assignments for the region in .META.
>
> Timestamp
> Event
> Description
> Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
>
> Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:49 split Region split from:TestTable,0089182778,1247904130413
>
> 3. Region server log file:
>
> [haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102 /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
> 2009-07-18 22:04:54,014 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:04:54,015 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:04:57,085 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:05:00,077 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> TestTable,0089182778,1247979707102/1884010304 available; sequence id is 57144455
> 2009-07-18 22:05:00,100 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:05:03,242 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
> 2009-07-18 22:05:03,242 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
> 2009-07-18 22:05:03,243 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed TestTable,0089182778,1247979707102
>
>
>
Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Posted by Ryan Rawson <ry...@gmail.com>.
A quick recover is to kill your master with 'kill' (not
hbase-daemon.sh). Then restart it.
If that doesn't work, you might have to manually delete the
regionserver assignment in meta:
deleteall '.META.', 'TestTable,0089182778,1247979707102', 'info:server'
The master will reassign the region within 60 seconds.
Let us know!
-ryan
On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao<ha...@ymail.com> wrote:
>
>
>
>
> Hi
>
>
> I am experiencing the NSRE exception (however, not all NSRE is created equal, so it seems) while scanning TestTable, TestTable is previously populated with sequentialWrite 100x1M records (using PerformanceEvaluation map reduce).
>
> I checked the region in exception and found that the region is not served because region sever is complaining about duplicate assignment:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
>
> I checked the .META. for the region, it indeed has two
> assignment records.
>
> I am wondering if this is a bug? How I can recover the region from this? (I searched archieve using duplicate assignment, got no result).
>
> I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env has
> 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode xreciver=4096, handler=50, ulimit 32768 (followed hbase-0.20.0-alpha overview_description religiously)
>
>
> Thanks in advance.
>
> Haijun
>
>
>
> 1. Exception while scanning:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.10.30.106:60020 for region TestTable,0089182778,1247979707102, row '0089182778', but failed after 10 attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0089182778,1247979707102
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
>
> 2. duplicate assignments for the region in .META.
>
> Timestamp
> Event
> Description
> Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
>
> Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:49 split Region split from:TestTable,0089182778,1247904130413
>
> 3. Region server log file:
>
> [haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102 /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
> 2009-07-18 22:04:54,014 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:04:54,015 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:04:57,085 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:05:00,077 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> TestTable,0089182778,1247979707102/1884010304 available; sequence id is 57144455
> 2009-07-18 22:05:00,100 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: TestTable,0089182778,1247979707102
> 2009-07-18 22:05:03,242 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
> 2009-07-18 22:05:03,242 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102: Duplicate assignment
> 2009-07-18 22:05:03,243 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed TestTable,0089182778,1247979707102
>
>
>
Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Posted by stack <st...@duboce.net>.
Thanks for the log though not at DEBUG.
This is a clean checkout?
What I see is a split, and then we assign out the lower half of the split
twice but we don't assign the top half. We had a bug like this after
hbase-1304 went in but was fixed long time back.
The MSG_REGION_CLOSE_WITHOUT_REPORT is rare but we don't seem to be doing
the right thing when we get one.
St.Ack
On Sun, Jul 19, 2009 at 3:16 PM, stack <st...@duboce.net> wrote:
> Then you would have missed this fix where edits to .META. were frozen out
> making it double-assignment more likely:
>
> ------------------------------------------------------------------------
> r794867 | stack | 2009-07-16 14:29:05 -0700 (Thu, 16 Jul 2009) | 1 line
>
> HBASE-1664 Disable 1058 on catalog tables
>
> Thanks for your patience and for living on the edge/TRUNK.
>
> St.Ack
> P.S. Would be interested in your master log nonetheless
>
>
>
>
> On Sun, Jul 19, 2009 at 3:07 PM, Haijun Cao <ha...@ymail.com> wrote:
>
>>
>> Yes, as recent as: Jul 16 13:48
>>
>>
>> Haijun
>>
>>
>> ________________________________
>> From: stack <st...@duboce.net>
>> To: hbase-user@hadoop.apache.org
>> Sent: Sunday, July 19, 2009 2:35:48 PM
>> Subject: Re: NSRE due to duplicate assignment
>> (MSG_REGION_CLOSE_WITHOUT_REPORT)
>>
>> Are you on a recent TRUNK? A few fixes went in end of last week that help
>> with this.
>>
>>
>> On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao <ha...@ymail.com> wrote:
>>
>> >
>> > I checked the .META. for the region, it indeed has two
>> > assignment records.
>> >
>> > I am wondering if this is a bug? How I can recover the region from this?
>> (I
>> > searched archieve using duplicate assignment, got no result).
>> >
>>
>> May I see the master log from around the double assignment (if you were
>> running DEBUG).
>>
>> Yeah, its a bug.
>>
>> Do as Ryan suggested or in shell do "close_region REGIONNAME". It'll be
>> reassigned and then reopened elsewhere.
>>
>> St.Ack
>>
>>
>>
>> >
>> > I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env
>> has
>> > 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode
>> xreciver=4096,
>> > handler=50, ulimit 32768 (followed hbase-0.20.0-alpha
>> overview_description
>> > religiously)
>> >
>> >
>> > Thanks in advance.
>> >
>> > Haijun
>> >
>> >
>> >
>> > 1. Exception while scanning:
>> >
>> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> contact
>> > region server 10.10.30.106:60020 for region
>> > TestTable,0089182778,1247979707102, row '0089182778', but failed after
>> 10
>> > attempts.
>> > Exceptions:
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > TestTable,0089182778,1247979707102
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
>> > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
>> > at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
>> >
>> > 2. duplicate assignments for the region in .META.
>> >
>> > Timestamp
>> > Event
>> > Description
>> > Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
>> >
>> > Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server
>> > snv-it-lin-012,60020,1247965643087
>> > Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server
>> > snv-it-lin-012,60020,1247965643087
>> > Sat, 18 Jul 2009 22:04:49 split Region split
>> > from:TestTable,0089182778,1247904130413
>> >
>> > 3. Region server log file:
>> >
>> > [haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102
>> >
>> /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
>> > 2009-07-18 22:04:54,014 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> > TestTable,0089182778,1247979707102
>> > 2009-07-18 22:04:54,015 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> MSG_REGION_OPEN:
>> > TestTable,0089182778,1247979707102
>> > 2009-07-18 22:04:57,085 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>> > TestTable,0089182778,1247979707102
>> > 2009-07-18 22:05:00,077 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegion: region
>> > TestTable,0089182778,1247979707102/1884010304 available; sequence id is
>> > 57144455
>> > 2009-07-18 22:05:00,100 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> MSG_REGION_OPEN:
>> > TestTable,0089182778,1247979707102
>> > 2009-07-18 22:05:03,242 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
>> > Duplicate assignment
>> > 2009-07-18 22:05:03,242 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
>> > Duplicate assignment
>> > 2009-07-18 22:05:03,243 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> > Closed TestTable,0089182778,1247979707102
>> >
>> >
>> >
>>
>>
>>
>>
>>
>
>
Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Posted by stack <st...@duboce.net>.
Then you would have missed this fix where edits to .META. were frozen out
making it double-assignment more likely:
------------------------------------------------------------------------
r794867 | stack | 2009-07-16 14:29:05 -0700 (Thu, 16 Jul 2009) | 1 line
HBASE-1664 Disable 1058 on catalog tables
Thanks for your patience and for living on the edge/TRUNK.
St.Ack
P.S. Would be interested in your master log nonetheless
On Sun, Jul 19, 2009 at 3:07 PM, Haijun Cao <ha...@ymail.com> wrote:
>
> Yes, as recent as: Jul 16 13:48
>
>
> Haijun
>
>
> ________________________________
> From: stack <st...@duboce.net>
> To: hbase-user@hadoop.apache.org
> Sent: Sunday, July 19, 2009 2:35:48 PM
> Subject: Re: NSRE due to duplicate assignment
> (MSG_REGION_CLOSE_WITHOUT_REPORT)
>
> Are you on a recent TRUNK? A few fixes went in end of last week that help
> with this.
>
>
> On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao <ha...@ymail.com> wrote:
>
> >
> > I checked the .META. for the region, it indeed has two
> > assignment records.
> >
> > I am wondering if this is a bug? How I can recover the region from this?
> (I
> > searched archieve using duplicate assignment, got no result).
> >
>
> May I see the master log from around the double assignment (if you were
> running DEBUG).
>
> Yeah, its a bug.
>
> Do as Ryan suggested or in shell do "close_region REGIONNAME". It'll be
> reassigned and then reopened elsewhere.
>
> St.Ack
>
>
>
> >
> > I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env
> has
> > 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode
> xreciver=4096,
> > handler=50, ulimit 32768 (followed hbase-0.20.0-alpha
> overview_description
> > religiously)
> >
> >
> > Thanks in advance.
> >
> > Haijun
> >
> >
> >
> > 1. Exception while scanning:
> >
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact
> > region server 10.10.30.106:60020 for region
> > TestTable,0089182778,1247979707102, row '0089182778', but failed after 10
> > attempts.
> > Exceptions:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > org.apache.hadoop.hbase.NotServingRegionException:
> > TestTable,0089182778,1247979707102
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
> > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
> > at
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
> >
> > 2. duplicate assignments for the region in .META.
> >
> > Timestamp
> > Event
> > Description
> > Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
> >
> > Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server
> > snv-it-lin-012,60020,1247965643087
> > Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server
> > snv-it-lin-012,60020,1247965643087
> > Sat, 18 Jul 2009 22:04:49 split Region split
> > from:TestTable,0089182778,1247904130413
> >
> > 3. Region server log file:
> >
> > [haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102
> >
> /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
> > 2009-07-18 22:04:54,014 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> > TestTable,0089182778,1247979707102
> > 2009-07-18 22:04:54,015 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN:
> > TestTable,0089182778,1247979707102
> > 2009-07-18 22:04:57,085 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> > TestTable,0089182778,1247979707102
> > 2009-07-18 22:05:00,077 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion: region
> > TestTable,0089182778,1247979707102/1884010304 available; sequence id is
> > 57144455
> > 2009-07-18 22:05:00,100 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN:
> > TestTable,0089182778,1247979707102
> > 2009-07-18 22:05:03,242 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> > Duplicate assignment
> > 2009-07-18 22:05:03,242 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> > MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> > Duplicate assignment
> > 2009-07-18 22:05:03,243 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed TestTable,0089182778,1247979707102
> >
> >
> >
>
>
>
>
>
Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Posted by Haijun Cao <ha...@ymail.com>.
Yes, as recent as: Jul 16 13:48
Haijun
________________________________
From: stack <st...@duboce.net>
To: hbase-user@hadoop.apache.org
Sent: Sunday, July 19, 2009 2:35:48 PM
Subject: Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Are you on a recent TRUNK? A few fixes went in end of last week that help
with this.
On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao <ha...@ymail.com> wrote:
>
> I checked the .META. for the region, it indeed has two
> assignment records.
>
> I am wondering if this is a bug? How I can recover the region from this? (I
> searched archieve using duplicate assignment, got no result).
>
May I see the master log from around the double assignment (if you were
running DEBUG).
Yeah, its a bug.
Do as Ryan suggested or in shell do "close_region REGIONNAME". It'll be
reassigned and then reopened elsewhere.
St.Ack
>
> I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env has
> 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode xreciver=4096,
> handler=50, ulimit 32768 (followed hbase-0.20.0-alpha overview_description
> religiously)
>
>
> Thanks in advance.
>
> Haijun
>
>
>
> 1. Exception while scanning:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server 10.10.30.106:60020 for region
> TestTable,0089182778,1247979707102, row '0089182778', but failed after 10
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> TestTable,0089182778,1247979707102
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
>
> 2. duplicate assignments for the region in .META.
>
> Timestamp
> Event
> Description
> Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
>
> Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:49 split Region split
> from:TestTable,0089182778,1247904130413
>
> 3. Region server log file:
>
> [haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102
> /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
> 2009-07-18 22:04:54,014 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:04:54,015 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:04:57,085 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:05:00,077 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> TestTable,0089182778,1247979707102/1884010304 available; sequence id is
> 57144455
> 2009-07-18 22:05:00,100 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> 2009-07-18 22:05:03,243 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed TestTable,0089182778,1247979707102
>
>
>
Re: NSRE due to duplicate assignment (MSG_REGION_CLOSE_WITHOUT_REPORT)
Posted by stack <st...@duboce.net>.
Are you on a recent TRUNK? A few fixes went in end of last week that help
with this.
On Sun, Jul 19, 2009 at 1:24 PM, Haijun Cao <ha...@ymail.com> wrote:
>
> I checked the .META. for the region, it indeed has two
> assignment records.
>
> I am wondering if this is a bug? How I can recover the region from this? (I
> searched archieve using duplicate assignment, got no result).
>
May I see the master log from around the double assignment (if you were
running DEBUG).
Yeah, its a bug.
Do as Ryan suggested or in shell do "close_region REGIONNAME". It'll be
reassigned and then reopened elsewhere.
St.Ack
>
> I am on hbase truck, hadoop-0.20.0 (plus 4681), zookeeper-3.2, test env has
> 3 machine (8core, 16G, 4x750G SATA disk, raid 0). DataNode xreciver=4096,
> handler=50, ulimit 32768 (followed hbase-0.20.0-alpha overview_description
> religiously)
>
>
> Thanks in advance.
>
> Haijun
>
>
>
> 1. Exception while scanning:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server 10.10.30.106:60020 for region
> TestTable,0089182778,1247979707102, row '0089182778', but failed after 10
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> TestTable,0089182778,1247979707102
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2230)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1848)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
>
> 2. duplicate assignments for the region in .META.
>
> Timestamp
> Event
> Description
> Sat, 18 Jul 2009 22:05:00 open Region opened on server: snv-it-lin-012
>
> Sat, 18 Jul 2009 22:04:57 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:54 assignment Region assigned to server
> snv-it-lin-012,60020,1247965643087
> Sat, 18 Jul 2009 22:04:49 split Region split
> from:TestTable,0089182778,1247904130413
>
> 3. Region server log file:
>
> [haijun@snv-it-lin-012 ~]$ grep TestTable,0089182778,1247979707102
> /disk1/opt/kindsight/hbase/hbase/logs/hbase-haijun-regionserver-snv-it-lin-012.log.2009-07-18
> 2009-07-18 22:04:54,014 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:04:54,015 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:04:57,085 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:05:00,077 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> TestTable,0089182778,1247979707102/1884010304 available; sequence id is
> 57144455
> 2009-07-18 22:05:00,100 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:
> TestTable,0089182778,1247979707102
> 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> 2009-07-18 22:05:03,242 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_CLOSE_WITHOUT_REPORT: TestTable,0089182778,1247979707102:
> Duplicate assignment
> 2009-07-18 22:05:03,243 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed TestTable,0089182778,1247979707102
>
>
>