You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2019/01/22 19:02:05 UTC

[jira] [Created] (HBASE-21757) retrying to close a region incorrectly resets its RIT age metric

Sergey Shelukhin created HBASE-21757:
----------------------------------------

             Summary: retrying to close a region incorrectly resets its RIT age metric
                 Key: HBASE-21757
                 URL: https://issues.apache.org/jira/browse/HBASE-21757
             Project: HBase
          Issue Type: Bug
    Affects Versions: 3.0.0
            Reporter: Sergey Shelukhin


We have a region stuck in RIT forever due to some other bug that I will file later.
Every 10 minutes it does the typical split-brain retry; I noticed that this retry resets the region's RIT age, so the "oldest RIT" metric never becomes larger than ~10mins even though the region has been stuck for days.

{noformat}
2019-01-22 10:40:52,993 INFO  [PEWorker-10] assignment.RegionStateStore: pid=1865 updating hbase:meta row=region, regionState=CLOSING, regionLocation=server,17020,1547824687684
2019-01-22 10:40:53,025 WARN  [PEWorker-10] assignment.RegionRemoteProcedureBase: Can not add remote operation pid=29297, ppid=1865, state=RUNNABLE, hasLock=true; org.apache.hadoop.hbase.master.assignment.CloseRegionProcedure for region {ENCODED => region, ...} to server server,17020,1547824687684, this usually because the server is alread dead, give up and mark the procedure as complete, the parent procedure will take care of this.
2019-01-22 10:40:53,040 INFO  [PEWorker-10] procedure2.ProcedureExecutor: Finished subprocedure(s) of pid=1865, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_CLOSED, hasLock=true; TransitRegionStateProcedure table=table, region=region, REOPEN/MOVE; resume parent processing.
2019-01-22 10:40:53,040 WARN  [PEWorker-7] assignment.TransitRegionStateProcedure: Failed transition, suspend 600secs pid=1865, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE, hasLock=true; TransitRegionStateProcedure table=table, region=region, REOPEN/MOVE; rit=CLOSING, location=server,17020,1547824687684; waiting on rectified condition fixed by other Procedure or operator intervention
2019-01-22 10:40:53,040 INFO  [PEWorker-7] procedure2.TimeoutExecutorThread: ADDED pid=1865, state=WAITING_TIMEOUT:REGION_STATE_TRANSITION_CLOSE, hasLock=true; TransitRegionStateProcedure table=table, region=region, REOPEN/MOVE; timeout=600000, timestamp=1548183053040
{noformat}

 !image-2019-01-22-11-00-39-030.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)