You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2011/03/24 18:45:06 UTC

[jira] [Commented] (HBASE-3669) Region in PENDING_OPEN keeps being bounced between RS and master

    [ https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010796#comment-13010796 ] 

Jonathan Gray commented on HBASE-3669:
--------------------------------------

When I've seen this happen, there has been another RS cutting in and transferring to OPENING.

As someone in the other JIRA indicates, this kind of thing can happen when one of the RS is unable to open the region because it doesn't have the proper compression lib or some DFS error.

If the master successfully transfers to OFFLINE and the RS sees it as OPENING, then almost certainly there's another RS that has gotten in the way.

The contents of the RIT znode actually contains serverName, so we should probably add additional debug information when the state transfer fails.  (Unable to go from OFFLINE to OPENING because already in OPENING by server #serverName#)

> Region in PENDING_OPEN keeps being bounced between RS and master
> ----------------------------------------------------------------
>
>                 Key: HBASE-3669
>                 URL: https://issues.apache.org/jira/browse/HBASE-3669
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> After going crazy killing region servers after HBASE-3668, most of the cluster recovered except for 3 regions that kept being refused by the region servers.
> One the master I would see:
> {code}
> 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826
> 2011-03-17 22:23:14,828 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
> 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. state=PENDING_OPEN, ts=1300400554826
> 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. so generated a random one; hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) available servers
> 2011-03-17 22:23:14,828 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21. to sv2borg171,60020,1300399357135
> {code}
> Then on the region server:
> {code}
> 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempting to transition node f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21., server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the node existed but was in the state RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
> {code}
> I'm not sure I fully understand what was going on... the master was suppose to OFFLINE the znode but then that's not what the region server was seeing? In any case, I was able to recover by doing a force unassign for each region and then assign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira