You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2012/11/06 05:40:14 UTC

[jira] [Updated] (HBASE-7101) HBase stuck in Region SPLIT

     [ https://issues.apache.org/jira/browse/HBASE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-7101:
---------------------------------

    Fix Version/s: 0.94.4
                   0.96.0

Assigning to 0.94.4 for now, since this is not new (I believe)
                
> HBase stuck in Region SPLIT 
> ----------------------------
>
>                 Key: HBASE-7101
>                 URL: https://issues.apache.org/jira/browse/HBASE-7101
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: Bing Jiang
>             Fix For: 0.96.0, 0.94.4
>
>
> I found this issue from a zknode which has existed for a long time in the unassigned parent.And HMaster report warnning log increasingly.The loop log is at below. 
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 1a1c950ad45812d7b4b9b90ebf268468 not found on server sev0040,60020,1350378314041; failed processing
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 1a1c950ad45812d7b4b9b90ebf268468 from server sev0040,60020,1350378314041 but it doesn't exist anymore, probably already processed its split
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Region 1a1c950ad45812d7b4b9b90ebf268468 not found on server gs-dpo-sev0040,60020,1350378314041; failed processing
> WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region 1a1c950ad45812d7b4b9b90ebf268468 from server sev0040,60020,1350378314041 but it doesn't exist anymore, probably already processed its split
> we use Hbase-0.92.1, and I trace back to the source code. HMaster AssignmentManager have already deleted the SPLIT_Region in its memory structure,but HRegionServer SplitTransaction has found the unassigned/parent-node existed in a transient state, precisely SplitTransaction executes tickleNodeSplit to update a new version a little later than  AssignmentManager deleting unassigned/parent-znode. After updating a version of the znode, it will intrigue the handleRegion operation again, however, AssignmentManager assert that the RegionState in Memory has been deleted, and transaction goes into a retry loop.
> In the SplitTransaction, transitionZKNode will retry tickleNodeSplit after sleeping 100ms. In my opinion, if the time is much longger than 100ms, all the operation from AssignmentManagement will finish off completely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira