You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2019/06/26 16:14:00 UTC

[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

    [ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873477#comment-16873477 ] 

Benedict commented on CASSANDRA-15013:
--------------------------------------

Thanks [~sumanth.pasupuleti].  I think this is very close to commit.

I've pushed a small number of extra suggestions [here|https://github.com/belliottsmith/cassandra/tree/15013-suggestions].  Mostly just minor stylistic simplifications, as well as a modification to of back pressure deployed to simply the number of connections currently experiencing back pressure, since it's not entirely clear how an operator would meaningfully interpret the number of times it was independently applied (since it would be applied more often for small messages than large ones)

Let me know what you think, and we can hopefully see about merging this soon.

> Message Flusher queue can grow unbounded, potentially running JVM out of memory
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Sumanth Pasupuleti
>            Assignee: Sumanth Pasupuleti
>            Priority: Normal
>              Labels: pull-request-available
>             Fix For: 4.0, 3.0.x, 3.11.x
>
>         Attachments: BlockedEpollEventLoopFromHeapDump.png, BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue bounded, since, in the current state, items get added to the queue without any checks on queue size, nor with any checks on netty outbound buffer to check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org