You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Raghu Angadi <ra...@apache.org> on 2011/05/13 09:07:15 UTC

region goes missing on rs (may be during reassignment)

This happened twice in two days:

clients can't access one of the regions and consistently fails
with NotServingRegionException
"org.apache.hadoop.hbase.NotServingRegionException:
Region is not online:
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f." ...

restarting the region server fixes this.

portions from RS and master are below. Please let me know if more logs are
needed. we will continue to look into this more.

*on RegionServer* :

    seems to happen right after compaction. grep of  '6136400' on region
server smf1-afz-19-sr1 :
[...]
011-05-12 12:05:08,125 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
region: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:08,126 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing
open of users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:08,127 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 114 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
2011-05-12 12:05:08,129 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Opening region: REGION => {NAME =>
'users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.', STARTKEY
=> '61364002', ENDKEY => '613930521', ENCODED =>
a0bf035ac417cdd0697464f1c48f387f, TABLE => {{NAME => 'users', MAX_FILESIZE
=> '130023424', FAMILIES => [{NAME => 'columns', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'LZO', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}, {NAME => 'extracted', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
'0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'protobuf', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
'2', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'time', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE => '0', VERSIONS => '2147483647', COMPRESSION =>
'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]}}
2011-05-12 12:05:08,130 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Instantiated users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,977 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; next
sequenceid=208908217
2011-05-12 12:05:20,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:20,982 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
because Region has too many store files; priority=21, compaction queue
size=0
2011-05-12 12:05:20,982 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
in region .META.,,1 with server=smf1-afz-19-sr1.prod.twitter.com:60020,
startcode=1302734205220
2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; next
sequenceid=208908217
2011-05-12 12:05:20,985 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.:
disabling compactions & flushes
2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,986 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:20,987 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:23,392 INFO org.apache.hadoop.hbase.regionserver.HRegion:
completed compaction on region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. after 2sec
2011-05-12 12:40:44,829 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; Region is not online:
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:40:45,862 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer:
NotServingRegionException; Region is not online:
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.

[...]

*on Master* : from grep '61364002,1297594642368' :
[...]
2011-05-12 12:03:54,786 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
was=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
state=CLOSED, ts=1305201834784
2011-05-12 12:03:54,790 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 114 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
[...]
2011-05-12 12:04:31,813 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. to
smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220
2011-05-12 12:04:39,333 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
[...]
2011-05-12 12:05:08,122 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out:  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
state=OPENING, ts=1305201871850
2011-05-12 12:05:08,122 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
for too long, reassigning
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
2011-05-12 12:05:08,122 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 128 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
state=RS_ZK_REGION_OPENING
2011-05-12 12:05:08,124 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
Retrieved 114 byte(s) of data from znode
/twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
2011-05-12 12:05:08,124 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Successfully transitioned
region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. into
OFFLINE and forcing a new assignment

Thanks for looking into this.

Raghu.

Re: region goes missing on rs (may be during reassignment)

Posted by Stack <st...@duboce.net>.
On Fri, May 13, 2011 at 3:01 PM, Raghu Angadi <an...@gmail.com> wrote:
> Thanks Stack. greatly appreciate the help.
>

No problem.

> hbase.regionserver.handler.count is set to 30.
> we have not set hbase.master.assignment.timeoutmonitor.timeout. will surely
> increase to 180 seconds as HBASE-3846 does.
>

You might want to go to 0.90.3 altogether.  Is that a pain for you?


> The load on the cluster is low to moderate and HBase holds up pretty well.
> Most of the load consists of hourly random writes to the table and
> sequential scans from MR jobs.
>

Thanks boss.


> I will send another email with locations to full master logs.
> There are many "Regions in transition timed out" messages for this region
> and many others spread over time.
>

Grand.

I can come over any time or you should drop by our place.  Its just a
few blocks away and we can munch on lunch while we dig in your logs.

St.Ack

> Raghu.
>
> On Fri, May 13, 2011 at 11:33 AM, Stack <st...@duboce.net> wrote:
>
>> I see that we are timing out region assignment then assigning
>> elsewhere, but the region opened anyway on first server (What do you
>> have hbase.regionserver.handler.count set to?  The default is 10 which
>> could mean a bunch of requests hanging out in the rpc queue before
>> getting into the server to be processed).  One thing you could do is
>> up your region in transition timeout.  Default is 30 seconds which if
>> there is a bunch of churn may not be enough time for region assignment
>> to complete -- was there churn at this time? (We up the default
>> timeout in 0.90.3, see  'HBASE-3846  Set RIT timeout higher').
>>
>> See below for more.
>>
>> On Fri, May 13, 2011 at 8:19 AM, Raghu Angadi <ra...@apache.org> wrote:
>> ...
>> >> > 2011-05-12 12:05:20,987 DEBUG
>> >> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
>> >> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
>>
>> The region opened successfully.
>>
>> But looking at the master log, 12 seconds earlier it says:
>>
>> >>>> 2011-05-12 12:05:08,122 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
>> timed out:  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
>> state=3DOPENING, ts=3D1305201871850 2011-05-12 12:05:08,122 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
>> for too long, reassigning
>>
>>
>> .... and then forces it reasssigned elsewhere (Your log from master
>> stops at this point.  I'd be interested in seeing more.  Send it to me
>> offline?).
>>
>> Thanks Raghu,
>> St.Ack
>>
>

Re: region goes missing on rs (may be during reassignment)

Posted by Raghu Angadi <an...@gmail.com>.
Thanks Stack. greatly appreciate the help.

hbase.regionserver.handler.count is set to 30.
we have not set hbase.master.assignment.timeoutmonitor.timeout. will surely
increase to 180 seconds as HBASE-3846 does.

The load on the cluster is low to moderate and HBase holds up pretty well.
Most of the load consists of hourly random writes to the table and
sequential scans from MR jobs.

I will send another email with locations to full master logs.
There are many "Regions in transition timed out" messages for this region
and many others spread over time.

Raghu.

On Fri, May 13, 2011 at 11:33 AM, Stack <st...@duboce.net> wrote:

> I see that we are timing out region assignment then assigning
> elsewhere, but the region opened anyway on first server (What do you
> have hbase.regionserver.handler.count set to?  The default is 10 which
> could mean a bunch of requests hanging out in the rpc queue before
> getting into the server to be processed).  One thing you could do is
> up your region in transition timeout.  Default is 30 seconds which if
> there is a bunch of churn may not be enough time for region assignment
> to complete -- was there churn at this time? (We up the default
> timeout in 0.90.3, see  'HBASE-3846  Set RIT timeout higher').
>
> See below for more.
>
> On Fri, May 13, 2011 at 8:19 AM, Raghu Angadi <ra...@apache.org> wrote:
> ...
> >> > 2011-05-12 12:05:20,987 DEBUG
> >> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> >> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
>
> The region opened successfully.
>
> But looking at the master log, 12 seconds earlier it says:
>
> >>>> 2011-05-12 12:05:08,122 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> timed out:  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> state=3DOPENING, ts=3D1305201871850 2011-05-12 12:05:08,122 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
> for too long, reassigning
>
>
> .... and then forces it reasssigned elsewhere (Your log from master
> stops at this point.  I'd be interested in seeing more.  Send it to me
> offline?).
>
> Thanks Raghu,
> St.Ack
>

Re: region goes missing on rs (may be during reassignment)

Posted by Stack <st...@duboce.net>.
I see that we are timing out region assignment then assigning
elsewhere, but the region opened anyway on first server (What do you
have hbase.regionserver.handler.count set to?  The default is 10 which
could mean a bunch of requests hanging out in the rpc queue before
getting into the server to be processed).  One thing you could do is
up your region in transition timeout.  Default is 30 seconds which if
there is a bunch of churn may not be enough time for region assignment
to complete -- was there churn at this time? (We up the default
timeout in 0.90.3, see  'HBASE-3846  Set RIT timeout higher').

See below for more.

On Fri, May 13, 2011 at 8:19 AM, Raghu Angadi <ra...@apache.org> wrote:
...
>> > 2011-05-12 12:05:20,987 DEBUG
>> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
>> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.

The region opened successfully.

But looking at the master log, 12 seconds earlier it says:

>>>> 2011-05-12 12:05:08,122 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. state=3DOPENING, ts=3D1305201871850 2011-05-12 12:05:08,122 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning


.... and then forces it reasssigned elsewhere (Your log from master
stops at this point.  I'd be interested in seeing more.  Send it to me
offline?).

Thanks Raghu,
St.Ack

Re: region goes missing on rs (may be during reassignment)

Posted by Raghu Angadi <ra...@apache.org>.
We don't have appends enabled (CDH 2). hbase version is '0.90.1, r1069174'.
there does not seem to be any data loss.

thanks,
Raghu.

On Fri, May 13, 2011 at 12:34 AM, Stack <st...@duboce.net> wrote:

> Raghu, are you running on an hdfs which has append?
> St.Ack
>
> On Fri, May 13, 2011 at 12:07 AM, Raghu Angadi <ra...@apache.org> wrote:
> > This happened twice in two days:
> >
> > clients can't access one of the regions and consistently fails
> > with NotServingRegionException
> > "org.apache.hadoop.hbase.NotServingRegionException:
> > Region is not online:
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f." ...
> >
> > restarting the region server fixes this.
> >
> > portions from RS and master are below. Please let me know if more logs
> are
> > needed. we will continue to look into this more.
> >
> > *on RegionServer* :
> >
> >    seems to happen right after compaction. grep of  '6136400' on region
> > server smf1-afz-19-sr1 :
> > [...]
> > 011-05-12 12:05:08,125 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to
> open
> > region: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:08,126 DEBUG
> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> Processing
> > open of users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:08,127 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 114 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> > 2011-05-12 12:05:08,129 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Opening region: REGION => {NAME =>
> > 'users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.',
> STARTKEY
> > => '61364002', ENDKEY => '613930521', ENCODED =>
> > a0bf035ac417cdd0697464f1c48f387f, TABLE => {{NAME => 'users',
> MAX_FILESIZE
> > => '130023424', FAMILIES => [{NAME => 'columns', BLOOMFILTER => 'NONE',
> > REPLICATION_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'LZO', TTL =>
> > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> > 'true'}, {NAME => 'extracted', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
> =>
> > '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647',
> BLOCKSIZE
> > => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> > 'protobuf', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
> > '2', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536',
> > IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'time', BLOOMFILTER
> =>
> > 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '2147483647', COMPRESSION
> =>
> > 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> > BLOCKCACHE => 'true'}]}}
> > 2011-05-12 12:05:08,130 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Instantiated
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,977 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.;
> next
> > sequenceid=208908217
> > 2011-05-12 12:05:20,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:20,982 DEBUG
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > requested for
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > because Region has too many store files; priority=21, compaction queue
> > size=0
> > 2011-05-12 12:05:20,982 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Starting compaction on region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> > Updated row
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > in region .META.,,1 with server=smf1-afz-19-sr1.prod.twitter.com:60020,
> > startcode=1302734205220
> > 2011-05-12 12:05:20,984 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.;
> next
> > sequenceid=208908217
> > 2011-05-12 12:05:20,985 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:20,986 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closing users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.:
> > disabling compactions & flushes
> > 2011-05-12 12:05:20,986 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Updates disabled for region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,986 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Closed users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:20,987 DEBUG
> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:23,392 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > completed compaction on region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. after 2sec
> > 2011-05-12 12:40:44,829 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > NotServingRegionException; Region is not online:
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:40:45,862 DEBUG
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > NotServingRegionException; Region is not online:
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> >
> > [...]
> >
> > *on Master* : from grep '61364002,1297594642368' :
> > [...]
> > 2011-05-12 12:03:54,786 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> > was=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > state=CLOSED, ts=1305201834784
> > 2011-05-12 12:03:54,790 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 114 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set
> watcher;
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> > [...]
> > 2011-05-12 12:04:31,813 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. to
> > smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220
> > 2011-05-12 12:04:39,333 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set
> watcher;
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > [...]
> > 2011-05-12 12:05:08,122 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> > timed out:
>  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > state=OPENING, ts=1305201871850
> > 2011-05-12 12:05:08,122 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
> > for too long, reassigning
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> > 2011-05-12 12:05:08,122 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 128 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> >
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> > state=RS_ZK_REGION_OPENING
> > 2011-05-12 12:05:08,124 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> >
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> > Retrieved 114 byte(s) of data from znode
> > /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set
> watcher;
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> > server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> > 2011-05-12 12:05:08,124 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager: Successfully
> transitioned
> > region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> into
> > OFFLINE and forcing a new assignment
> >
> > Thanks for looking into this.
> >
> > Raghu.
> >
>

Re: region goes missing on rs (may be during reassignment)

Posted by Stack <st...@duboce.net>.
Raghu, are you running on an hdfs which has append?
St.Ack

On Fri, May 13, 2011 at 12:07 AM, Raghu Angadi <ra...@apache.org> wrote:
> This happened twice in two days:
>
> clients can't access one of the regions and consistently fails
> with NotServingRegionException
> "org.apache.hadoop.hbase.NotServingRegionException:
> Region is not online:
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f." ...
>
> restarting the region server fixes this.
>
> portions from RS and master are below. Please let me know if more logs are
> needed. we will continue to look into this more.
>
> *on RegionServer* :
>
>    seems to happen right after compaction. grep of  '6136400' on region
> server smf1-afz-19-sr1 :
> [...]
> 011-05-12 12:05:08,125 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
> region: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:08,126 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing
> open of users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:08,127 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> Retrieved 114 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-05-12 12:05:08,129 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Opening region: REGION => {NAME =>
> 'users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.', STARTKEY
> => '61364002', ENDKEY => '613930521', ENCODED =>
> a0bf035ac417cdd0697464f1c48f387f, TABLE => {{NAME => 'users', MAX_FILESIZE
> => '130023424', FAMILIES => [{NAME => 'columns', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '2', COMPRESSION => 'LZO', TTL =>
> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
> 'true'}, {NAME => 'extracted', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> '0', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE
> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> 'protobuf', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
> '2', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'time', BLOOMFILTER =>
> 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '2147483647', COMPRESSION =>
> 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}]}}
> 2011-05-12 12:05:08,130 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Instantiated users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:20,977 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; next
> sequenceid=208908217
> 2011-05-12 12:05:20,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> Retrieved 128 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> state=RS_ZK_REGION_OPENING
> 2011-05-12 12:05:20,982 DEBUG
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> requested for users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> because Region has too many store files; priority=21, compaction queue
> size=0
> 2011-05-12 12:05:20,982 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on region
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Updated row users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> in region .META.,,1 with server=smf1-afz-19-sr1.prod.twitter.com:60020,
> startcode=1302734205220
> 2011-05-12 12:05:20,984 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.; next
> sequenceid=208908217
> 2011-05-12 12:05:20,985 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> Retrieved 128 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> state=RS_ZK_REGION_OPENING
> 2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62-0x42daa1daae7ca62
> Retrieved 128 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> state=RS_ZK_REGION_OPENING
> 2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Closing users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.:
> disabling compactions & flushes
> 2011-05-12 12:05:20,986 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Updates disabled for region
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:20,986 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:20,987 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:23,392 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> completed compaction on region
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. after 2sec
> 2011-05-12 12:40:44,829 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> NotServingRegionException; Region is not online:
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:40:45,862 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> NotServingRegionException; Region is not online:
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
>
> [...]
>
> *on Master* : from grep '61364002,1297594642368' :
> [...]
> 2011-05-12 12:03:54,786 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> state=CLOSED, ts=1305201834784
> 2011-05-12 12:03:54,790 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> Retrieved 114 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
> region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> [...]
> 2011-05-12 12:04:31,813 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. to
> smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220
> 2011-05-12 12:04:39,333 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> Retrieved 128 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
> region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> state=RS_ZK_REGION_OPENING
> [...]
> 2011-05-12 12:05:08,122 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
> timed out:  users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> state=OPENING, ts=1305201871850
> 2011-05-12 12:05:08,122 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING
> for too long, reassigning
> region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.
> 2011-05-12 12:05:08,122 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> Retrieved 128 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f;
> data=region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-afz-19-sr1.prod.twitter.com,60020,1302734205220,
> state=RS_ZK_REGION_OPENING
> 2011-05-12 12:05:08,124 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6-0x12daaaef8382ac6
> Retrieved 114 byte(s) of data from znode
> /twitter/hbase/unassigned/a0bf035ac417cdd0697464f1c48f387f and set watcher;
> region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f.,
> server=smf1-aen-35-sr1.prod.twitter.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-05-12 12:05:08,124 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Successfully transitioned
> region=users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. into
> OFFLINE and forcing a new assignment
>
> Thanks for looking into this.
>
> Raghu.
>