You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Guanghao Zhang (Jira)" <ji...@apache.org> on 2020/02/26 09:34:00 UTC

[jira] [Updated] (HBASE-23895) STUCK Region-In-Transition when failed to insert procedure to procedure store

     [ https://issues.apache.org/jira/browse/HBASE-23895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guanghao Zhang updated HBASE-23895:
-----------------------------------
    Summary: STUCK Region-In-Transition when failed to insert procedure to procedure store  (was: STUCK Region-In-Transition because failed to insert procedure to procedure store)

> STUCK Region-In-Transition when failed to insert procedure to procedure store
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-23895
>                 URL: https://issues.apache.org/jira/browse/HBASE-23895
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Priority: Major
>
> When move an region, it will generate a TRSP first and set the procedure to the region state node. But if the submit TRSP failed, the procedure cannot be unset now and the region will stuck in RIT.
> hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
> {code:java}
> public Future<byte[]> moveAsync(RegionPlan regionPlan) throws HBaseIOException {
>     TransitRegionStateProcedure proc =
>       createMoveRegionProcedure(regionPlan.getRegionInfo(), regionPlan.getDestination());
>     return ProcedureSyncWait.submitProcedure(master.getMasterProcedureExecutor(), proc);
>   }
>   public TransitRegionStateProcedure createMoveRegionProcedure(RegionInfo regionInfo,
>       ServerName targetServer) throws HBaseIOException {
>     RegionStateNode regionNode = this.regionStates.getRegionStateNode(regionInfo);
>     if (regionNode == null) {
>       throw new UnknownRegionException("No RegionStateNode found for " +
>           regionInfo.getEncodedName() + "(Closed/Deleted?)");
>     }    
>     TransitRegionStateProcedure proc;
>     regionNode.lock();
>     try {
>       preTransitCheck(regionNode, STATES_EXPECTED_ON_UNASSIGN_OR_MOVE);
>       regionNode.checkOnline();
>       proc = TransitRegionStateProcedure.move(getProcedureEnvironment(), regionInfo, targetServer);
>       regionNode.setProcedure(proc);
>     } finally {
>       regionNode.unlock();
>     }    
>     return proc;
>   }
> {code}
> hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNode.java
> {code:java}
>   public void setProcedure(TransitRegionStateProcedure proc) {
>     assert this.procedure == null;
>     this.procedure = proc;
>     ritMap.put(regionInfo, this);
>   }
>   public void unsetProcedure(TransitRegionStateProcedure proc) {
>     assert this.procedure == proc;
>     this.procedure = null;
>     ritMap.remove(regionInfo, this);
>   } 
> {code}
> {code:java}
> 2020-02-26,13:45:21,344 ERROR [RpcServer.default.RWQ.Fifo.read.handler=437,queue=5,port=21500] org.apache.hadoop.hbase.ipc.RpcServer: Unexpected throwable object
> java.io.UncheckedIOException: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Timed out waiting for lock for row: \x00\x00\x00\x00\x00\x0B\xAB\xD2 in region 9731aea823e7f83264b14713ae486fb7
>         at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.update(RegionProcedureStore.java:588)
>         at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.insert(RegionProcedureStore.java:545)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:1042)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:860)
>         at org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitProcedure(ProcedureSyncWait.java:123)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.moveAsync(AssignmentManager.java:657)
>         at org.apache.hadoop.hbase.master.HMaster.executeRegionPlansWithThrottling(HMaster.java:1793)
>         at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1761)
>         at org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:654)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:374)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:135)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:352)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:332)
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Timed out waiting for lock for row: \x00\x00\x00\x00\x00\x0B\xAB\xD2 in region 9731aea823e7f83264b14713ae486fb7
>         at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:6158)
>         at org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3488)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4235)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4208)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4134)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4125)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4139)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4511)
>         at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3209)
>         at org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.update(RegionProcedureStore.java:584)
>         ... 13 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)