You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aleksey Yeschenko (JIRA)" <ji...@apache.org> on 2014/12/02 01:05:13 UTC

[jira] [Commented] (CASSANDRA-8285) OOME in Cassandra 2.0.11

    [ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230732#comment-14230732 ] 

Aleksey Yeschenko commented on CASSANDRA-8285:
----------------------------------------------

CASSANDRA-6998 broke it by forcing a synchronous major compaction in HHOM#scheduleAllHints(). But scheduleAllHints() is running on StorageService.optionalTasks, same single threaded executor that we use for memtable flushing.

So with a huge hint buildup (which is what these duration tests are doing), it can block flushing other memtables, b/c metered flusher is scheduled there too.

This solves it for 2.0.11. Pierre's issue is unrelated.

> OOME in Cassandra 2.0.11
> ------------------------
>
>                 Key: CASSANDRA-8285
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT
> Cassandra 2.0.11 + ruby-driver 1.0-beta
>            Reporter: Pierre Laporte
>            Assignee: Aleksey Yeschenko
>         Attachments: OOME_node_system.log, gc-1416849312.log.gz, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png, system.log.gz
>
>
> We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME.  This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot.
> Attached are :
> | OOME_node_system.log | The system.log of one Cassandra node that crashed |
> | gc.log.gz | The GC log on the same node |
> | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle |
> | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong |
> Workload :
> Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads.  There is not change in the workload during the test.
> Symptoms :
> In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32.  This causes an allocation that fills the heap.  We eventually get stuck in a Full GC storm and get an OOME in the logs.
> I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1.  The error does not occur.  It seems specific to 2.0.11.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)