You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2024/04/07 18:51:00 UTC

[jira] (HBASE-28405) Region open procedure silently returns without notifying the parent proc

    [ https://issues.apache.org/jira/browse/HBASE-28405 ]


    Viraj Jasani deleted comment on HBASE-28405:
    --------------------------------------

was (Author: vjasani):
Btw in this whole investigation, we know that we do have real RIT because the region assign as part of the "region merge rollback" could not be completed, and this definitely needs to be fixed.

However, from HBase client perspective, read/write should not be affected on the merging region right? Because the region state is OPEN even in meta, only master's in-memory image has the state as MERGING. This doesn't change the fact that RIT needs to be fixed, it's definitely a bug, triggers alerts, requires manual hbck intervention which we need to minimize as much as possible, but I hope that at least clients should be fine in this whole situation.

> Region open procedure silently returns without notifying the parent proc
> ------------------------------------------------------------------------
>
>                 Key: HBASE-28405
>                 URL: https://issues.apache.org/jira/browse/HBASE-28405
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2, Region Assignment
>    Affects Versions: 2.4.17, 2.5.8
>            Reporter: Aman Poonia
>            Assignee: Aman Poonia
>            Priority: Major
>              Labels: pull-request-available
>
> *We had a scenario in production where a merge operation had failed as below*
> _2024-02-11 10:53:57,715 ERROR [PEWorker-31] assignment.MergeTableRegionsProcedure - Error trying to merge [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_
> _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_
> _at org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_
> _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_
> _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_
> _at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_
> _at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_
> _at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_
> _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_
> _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_
> _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_
> _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_
> _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_
> _at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_
> *Now when we do rollback of failed merge operation we see a issue where region is in state opened until the RS holding it stopped.*
> Rollback create a TRSP as below
> _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - Stored [pid=26674602, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=table1, region=a92008b76ccae47d55c590930b837036, ASSIGN]_
> *and rollback finished successfully*
> _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - Rolled back pid=26673594, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.HBaseIOException via master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, location=rs-229,60020,1707587658182, table=table1, region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; MergeTableRegionsProcedure table=table1, regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], force=false exec-time=1.4820 sec_
> *We create a procedure to open the region a92008b76ccae47d55c590930b837036. Intrestingly we didnt close the region as creation of procedure to close regions had thrown exception and not execution of procedure. When we run TRSP it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This handlers on execution suggests that region is already online*
> Sequence of events are as follow
> _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, regionState=OPENING, regionLocation=rs-210,60020,1707596461539_
> _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; OpenRegionProcedure a92008b76ccae47d55c590930b837036, server=rs-210,60020,1707596461539}]_
> _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] handler.AssignRegionHandler - Received OPEN for table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already online_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)