You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Benjamin Lerer (Jira)" <ji...@apache.org> on 2021/04/20 09:34:00 UTC

[jira] [Commented] (CASSANDRA-16616) Harden internode message resource limit accounting against serialization failures

    [ https://issues.apache.org/jira/browse/CASSANDRA-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325649#comment-17325649 ] 

Benjamin Lerer commented on CASSANDRA-16616:
--------------------------------------------

The patch looks good to me +1.

> Harden internode message resource limit accounting against serialization failures
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16616
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16616
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Internode
>            Reporter: Jon Meredith
>            Assignee: Jon Meredith
>            Priority: Normal
>             Fix For: 4.0-rc
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the internode messaging exception recovery code fails and is unable to correctly adjust the resource limits for an OutboundConnection, it affects the other connection types sharing the same OutboundConnections so that any of the connections could hit {{assert using >= 0;}} in
> {{org.apache.cassandra.net.ResourceLimits.Concurrent#release}}.
> While it is possible to modify all of the outbound connection code to re-initialize all of the connections with a correct limit, the effort to test and maintain the recovery code seems too high for something that should "never happen" (except it did once, which is why it needs hardening).  The safer option is to kill the JVM and have whatever external monitoring is in place restart the instance in a known good state.
> Additionally, the logging for dropping outbound messages that have expired or are unserializable messages takes place after the recovery handling logic. If there are problems with the recovery logic that throw an exception, the message is never logged for future diagnosis. Logging should take place first, and then releasing capacity/handling the expiration/serialization.
> Discovered on a branch modified for testing that threw an exception in the Verb.serializeSize method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org