You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/02/28 21:55:00 UTC

[jira] [Resolved] (HBASE-23904) Procedure updating meta and Master shutdown are incompatible: CODE-BUG

     [ https://issues.apache.org/jira/browse/HBASE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Stack resolved HBASE-23904.
-----------------------------------
    Fix Version/s: 2.3.0
                   3.0.0
       Resolution: Fixed

Pushed to branch-2 and master.

> Procedure updating meta and Master shutdown are incompatible: CODE-BUG
> ----------------------------------------------------------------------
>
>                 Key: HBASE-23904
>                 URL: https://issues.apache.org/jira/browse/HBASE-23904
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>            Reporter: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0
>
>
> Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a failure because
> {code:java}
> 2020-02-27 00:57:51,702 ERROR [PEWorker-6] procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception: pid=14, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; MergeTableRegionsProcedure table=test, regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c], force=false
> java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@28b956c7 rejected from java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
>         at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
>         at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
>         at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
>         at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
>         at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
>         at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
>         at org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
>         at org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
>         at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
>         at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
>         at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
>         at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
>         at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
>  {code}
> A few seconds above, as part of the test, we'd stopped Master
> {code:java}
> 2020-02-27 00:57:51,620 INFO  [Time-limited test] regionserver.HRegionServer(2212): ***** STOPPING region server 'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
> 2020-02-27 00:57:51,620 INFO  [Time-limited test] regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code}
> The rejected execution damages the merge procedure. It shows as an unhandled CODE-BUG.
> Why we let a runtime exception out when trying to update meta is mildly interesting. We use Throwables.propagateIfPossible(e, IOException.{color:#000080}class{color}) from guava which at first blush would seem to throw the exception if it an IOE else return. In code, if return, we'll wrap whatever makes it through with an IOE.  But propagateIfPossible is a little sneaky in that if the passed Exception is a RuntimeException, as the Reject is, it will go ahead and throw and NOT return.  Not sure if this was authors' understanding ([~zhangduo]  ? HBASE-21789 for hbase-2.2.0). Looking at the old code, which called makeIOExceptionOfException from ProtobufUtil, if I read it right, this would wrap the exception in an IOE regardless whether a RuntimeException or not.
> A little digging exposes that likely root of the problem is that the Master is stopping. Its connection, which is used by the merge procedure when updating meta, is being shutdown too. The rejected exception is probably because the pool has been shutdown. Hard to tell for sure as Master doesn't log the minutae of services closed.
> The propagateIfPossible facility is used in a few places. Its addition to MetaTableAccessor is in one place only by HBASE-21789. I could restore the old behavior easy enough (Was afraid we had to deal with this issue around ALL meta table accesses via MTA).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)