You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/07/24 06:51:10 UTC

[jira] [Commented] (HBASE-3741) Make HRegionServer aware of the regions it's opening/closing

    [ https://issues.apache.org/jira/browse/HBASE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070101#comment-13070101 ] 

stack commented on HBASE-3741:
------------------------------

Question: Looking at this patch again, if we throw a RegionAlreadyInTransitionException, won't we just assign the region elsewhere though RegionAlreadyInTransitionException in at least one case here is saying that the region is already open on this regionserver?

> Make HRegionServer aware of the regions it's opening/closing
> ------------------------------------------------------------
>
>                 Key: HBASE-3741
>                 URL: https://issues.apache.org/jira/browse/HBASE-3741
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.90.3
>
>         Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix-v3.patch, HBASE-3741-rsfix.patch, HBASE-3741-trunk.patch
>
>
> This is a serious issue about a race between regions being opened and closed in region servers. We had this situation where the master tried to unassign a region for balancing, failed, force unassigned it, force assigned it somewhere else, failed to open it on another region server (took too long), and then reassigned it back to the original region server. A few seconds later, the region server processed the first closed and the region was left unassigned.
> This is from the master log:
> {quote}
> 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=sv4borg42,60020,1300920459477, load=(requests=187, regions=574, usedHeap=3918, maxHeap=6973) for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_CLOSE, ts=1302041477758
> 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> ...
> 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=CLOSED, ts=1302041685733
> 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x42ec2cece810b68 Creating (or updating) unassigned node for 1470298961 with OFFLINE state
> ...
> 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961; plan=hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, src=sv4borg42,60020,1300920459477, dest=sv4borg40,60020,1302041218196
> 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg40,60020,1302041218196
> 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_OPEN, ts=1302041700944
> 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_OPEN, ts=1302041700944
> ...
> 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 so generated a random one; hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, src=, dest=sv4borg42,60020,1300920459477; 19 (online=19, exclude=null) available servers
> 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg42,60020,1300920459477
> 2011-04-05 15:15:40,951 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x42ec2cece810b68 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/prodjobs/unassigned/1470298961
> 2011-04-05 15:15:40,952 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode /prodjobs/unassigned/1470298961 and set watcher; region=stumbles_by_userid2,'����穗���6,1266566087256, server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_OPENED
> 2011-04-05 15:15:40,952 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4borg42,60020,1300920459477, region=1470298961
> 2011-04-05 15:15:42,222 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 1470298961; deleting unassigned node
> ...
> 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode /prodjobs/unassigned/1470298961 and set watcher; region=stumbles_by_userid2,'����穗���6,1266566087256, server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING
> 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /prodjobs/unassigned/1470298961 (region=stumbles_by_userid2,'����穗���6,1266566087256, server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING)
> 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=sv4borg42,60020,1300920459477, region=1470298961
> 2011-04-05 15:15:55,812 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSING for region 1470298961 from server sv4borg42,60020,1300920459477 but region was in  the state null and not in expected PENDING_CLOSE or CLOSING states
> {quote}
> And from sv4borg42:
> {quote}
> 2011-04-05 15:09:58,755 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:11:17,757 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,675 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,700 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961: disabling compactions & flushes
> 2011-04-05 15:14:45,701 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,701 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,758 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,486 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: REGION => {NAME => 'stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256', STARTKEY => '\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6', ENDKEY => '\x00'\x9AU\x7F\xFF\xFE\xEBQ\xB0\xC3\xEF\x00Jr\xF2', ENCODED => 1470298961, TABLE => ...
> 2011-04-05 15:15:39,487 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:40,399 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961; next sequenceid=37627407247
> 2011-04-05 15:15:40,488 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 in region .META.,,1 with server=sv4borg42:60020, startcode=1300920459477
> 2011-04-05 15:15:40,582 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,776 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961: disabling compactions & flushes
> 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,809 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,842 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,943 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,943 WARN org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Received CLOSE for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 but currently not serving
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira