You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2020/06/18 06:17:00 UTC

[jira] [Commented] (HBASE-24585) If RSProcedureHandler throws exception, it aborts the hosting RS

    [ https://issues.apache.org/jira/browse/HBASE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139119#comment-17139119 ] 

Duo Zhang commented on HBASE-24585:
-----------------------------------

I think the design here is the task should not throw any exceptions, as it should just reports the error to master. If we do meet an exception, the only safe way is to abort. In SCP, master will know that the remote task is failed(as the RS itself is gone), and then reschedule the remote task.

So here the problem is that we should not propagate the exception to the upper layer?

> If RSProcedureHandler throws exception, it aborts the hosting RS
> ----------------------------------------------------------------
>
>                 Key: HBASE-24585
>                 URL: https://issues.apache.org/jira/browse/HBASE-24585
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Priority: Major
>
> HBASE-24574 proc v2 distributed log splitting is enabled. A remote split fails because it was interrupted. The InterruptedException became an IOE and then bubbled up and out of the RSPH below causing a RS abort.
> {code}
>  2020-06-17 21:20:37,472 ERROR [RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0] handler.RSProcedureHandler: Error when call RSProcedureCallable:
>  java.io.IOException: Failed WAL split, status=RESIGNED, wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta
>    at org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106)
>    at org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86)
>    at org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49)
>    at org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
>    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>    at java.lang.Thread.run(Thread.java:748)
> {code}
> The remote-procedure framework needs to be more resilient? Log the exception unless an ERROR and keep going? Otherwise, makes features like procedurev2 distributed log splitting brittle. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)