You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2023/10/06 06:15:00 UTC

[jira] [Commented] (HBASE-28128) Reject requests at RPC layer when RegionServer is aborting

    [ https://issues.apache.org/jira/browse/HBASE-28128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772449#comment-17772449 ] 

Hudson commented on HBASE-28128:
--------------------------------

Results for branch branch-2
	[build #897 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/897/]: (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/897/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/897/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/897/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/897/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Reject requests at RPC layer when RegionServer is aborting
> ----------------------------------------------------------
>
>                 Key: HBASE-28128
>                 URL: https://issues.apache.org/jira/browse/HBASE-28128
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>             Fix For: 2.6.0, 2.5.6, 3.0.0-beta-1
>
>
> We recently had an operational incident where the RegionServer got aborted, but failed to exit within a reasonable timeframe. We're going to tune hbase.regionserver.abort.timeout much lower than the 20m default, but even with that it makes little sense to accept requests when the server is aborting.
> In our case, the server was impaired and not processing requests. The call queue was full, so NettyRpcServer kept trying and failing to add requests to the queue. This results in CallQueueTooBigException, which is not a meta cache clearing exception. It continued throwing these exceptions for multiple minutes until we finally manually killed the server.
> I'd like to add a check in ServerRpcConnection.processRequest, where we check if regionServer.isAborted() and throw a RegionServerAbortedException rather than attempt to enqueue the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)