You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Claire Iacono (Jira)" <ji...@apache.org> on 2022/04/14 00:14:00 UTC

[jira] [Commented] (HBASE-22657) HBase : STUCK Region-In-Transition

    [ https://issues.apache.org/jira/browse/HBASE-22657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521984#comment-17521984 ] 

Claire Iacono commented on HBASE-22657:
---------------------------------------

This problem has also come up on our clusters - while it can be fixed manually, it would be much better if a way could be found for HBase recover itself when a region gets stuck in transition. Also interested in a root cause here.

> HBase : STUCK Region-In-Transition 
> -----------------------------------
>
>                 Key: HBASE-22657
>                 URL: https://issues.apache.org/jira/browse/HBASE-22657
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: oktay tuncay
>            Priority: Critical
>
> When we check the number of regions in transition on Ambari, It shows 1 transition is waiting. (It's more than 1 in other cluster)
> And also, when check the table with command "hbase hbck -details *table_name*" status looks INCONSISTENT
> _There are 0 overlap groups with 0 overlapping regions
> ERROR: Found inconsistency in table *Table_Name*
> Summary:
> Table hbase:meta is okay.
> Number of regions: 1
> Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
> Table *Table_Name *is okay.
> Number of regions: 39
> Deployed on: hostname1:port, hostname2:port, hostname3:port, hostname4:port
> 2 inconsistencies detected.
> Status: *INCONSISTENT*
> When I check the logfiles, I saw following warning messages,
> 2019-06-09T07:14:15.179+02:00 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=CLOSING, location=*hostname*,*port*,1558699727048, table=*table_name*, region=c67dd5d8bcd174cc2001695c31475ab1
> According this message, region c67dd5d8bcd174cc2001695c31475ab1 try to assign *host* but this operation is stuck.
> We stopped RS process on *host* and force assign to another RS which are running.
> *hbase(main):001:0> assign 'c67dd5d8bcd174cc2001695c31475ab1'*
> After that operaion, INCONSISTENT has gone and we re-started RS on host.
> One of the reasons why a region gets stuck in transition is because, when it is being moved across regionservers, it is unassigned from the source regionserver but is never assigned to another regionserver
> I think Below code is responsible for that process. 
> private void handleRegionOverStuckWarningThreshold(final RegionInfo regionInfo) {
> final RegionStateNode regionNode = regionStates.getRegionStateNode(regionInfo);
> //if (regionNode.isStuck()) {
> LOG.warn("STUCK Region-In-Transition {}", regionNode);_
> It seems one potential way of unstuck the region is to send close request to the region server. May be blocked because another Procedure holds the exclusive lock and is not letting go.
> My question is what is the root cause for this problem and I think, HBase should be able to fix region-In-Transition issue.
> We can fix this problem by manual but some customer does not have this knowledge and I think HBase needs to be recover itself.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)