You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/02/27 17:31:00 UTC

[jira] [Comment Edited] (HBASE-23904) Procedure updating meta and Master shutdown are incompatible: CODE-BUG

    [ https://issues.apache.org/jira/browse/HBASE-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046822#comment-17046822 ] 

Michael Stack edited comment on HBASE-23904 at 2/27/20 5:30 PM:
----------------------------------------------------------------

{quote}So I do not think we should let the procedures to finish when master is going to quit...
{quote}
The change does not do this. The Master shutdown is not held up. Rather, we're just making it so the Procedure fails cleanly:

 

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java#L246]

 

If wanted, I could add a check that does "if Reject exception AND Connection closing, convert Reject RuntimeException to IOE" but this is snowflakey especially given all other (sync) Table calls just do Callable which under the wraps does a blanket convertion to IOE.

 

I'm good for revisiting this stuff but such a review I'd suggest we do in a new issue since it will require substantial change (currently I'm trying to fix flakies and help get the minor 2.3.0 RC ready). Thanks.

 


was (Author: stack):
{quote}So I do not think we should let the procedures to finish when master is going to quit...
{quote}
The change does not do this. The Master shutdown is not held up. The Procedure fails cleanly:

 

[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MergeTableRegionsProcedure.java#L246]

 

> Procedure updating meta and Master shutdown are incompatible: CODE-BUG
> ----------------------------------------------------------------------
>
>                 Key: HBASE-23904
>                 URL: https://issues.apache.org/jira/browse/HBASE-23904
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>            Reporter: Michael Stack
>            Priority: Major
>
> Chasing flakies, studying TestMasterAbortWhileMergingTable, I noticed a failure because
> {code:java}
> 2020-02-27 00:57:51,702 ERROR [PEWorker-6] procedure2.ProcedureExecutor(1688): CODE-BUG: Uncaught runtime exception: pid=14, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; MergeTableRegionsProcedure table=test, regions=[48c9be922fa4356bfc7fc61b5b0785f3, ef196d5377c5c1d143e9a2a2ea056a9c], force=false
> java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@28b956c7 rejected from java.util.concurrent.ThreadPoolExecutor@639f20e5[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5]
>         at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
>         at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
>         at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
>         at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
>         at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:974)
>         at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:953)
>         at org.apache.hadoop.hbase.MetaTableAccessor.multiMutate(MetaTableAccessor.java:1771)
>         at org.apache.hadoop.hbase.MetaTableAccessor.mergeRegions(MetaTableAccessor.java:1637)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.mergeRegions(RegionStateStore.java:268)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsMerged(AssignmentManager.java:1854)
>         at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.updateMetaForMergedRegions(MergeTableRegionsProcedure.java:687)
>         at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:229)
>         at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:77)
>         at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
>         at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1669)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1416)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1986)
>  {code}
> A few seconds above, as part of the test, we'd stopped Master
> {code:java}
> 2020-02-27 00:57:51,620 INFO  [Time-limited test] regionserver.HRegionServer(2212): ***** STOPPING region server 'rn-hbased-lapp01.rno.exampl.com,36587,1582765058324' *****
> 2020-02-27 00:57:51,620 INFO  [Time-limited test] regionserver.HRegionServer(2226): STOPPED: Stopping master 0 {code}
> The rejected execution damages the merge procedure. It shows as an unhandled CODE-BUG.
> Why we let a runtime exception out when trying to update meta is mildly interesting. We use Throwables.propagateIfPossible(e, IOException.{color:#000080}class{color}) from guava which at first blush would seem to throw the exception if it an IOE else return. In code, if return, we'll wrap whatever makes it through with an IOE.  But propagateIfPossible is a little sneaky in that if the passed Exception is a RuntimeException, as the Reject is, it will go ahead and throw and NOT return.  Not sure if this was authors' understanding ([~zhangduo]  ? HBASE-21789 for hbase-2.2.0). Looking at the old code, which called makeIOExceptionOfException from ProtobufUtil, if I read it right, this would wrap the exception in an IOE regardless whether a RuntimeException or not.
> A little digging exposes that likely root of the problem is that the Master is stopping. Its connection, which is used by the merge procedure when updating meta, is being shutdown too. The rejected exception is probably because the pool has been shutdown. Hard to tell for sure as Master doesn't log the minutae of services closed.
> The propagateIfPossible facility is used in a few places. Its addition to MetaTableAccessor is in one place only by HBASE-21789. I could restore the old behavior easy enough (Was afraid we had to deal with this issue around ALL meta table accesses via MTA).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)