You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/12/12 14:47:13 UTC

[jira] [Commented] (CASSANDRA-7926) Stress can OOM on merging of timing samples

    [ https://issues.apache.org/jira/browse/CASSANDRA-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244153#comment-14244153 ] 

Benedict commented on CASSANDRA-7926:
-------------------------------------

For the moment we're comfortable with having fewer than maxSamples - because the space required is pretty low, we can easily oversize it for our desired accuracy. The main reason for this decision in the first place was to ensure simplicity (and zero thought to deliver) of merging samples, so that we have a truly uniform resulting sample. When we have the time to revisit this we probably want to construct an explicitly biased sample that can track outliers with greater probability (but tracking their actual incidence), at which point we could also consider introducing reservoir sampling. In fairness, though, we could very easily switch to reservoir sampling for the individual/source sample accumulation and use the current (lossier) method for merging samples.

I've committed with your nits.

> Stress can OOM on merging of timing samples
> -------------------------------------------
>
>                 Key: CASSANDRA-7926
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7926
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>              Labels: tools
>             Fix For: 2.1.3
>
>
> {noformat}
> Exception in thread "StressMetrics:2" java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2343)
>         at org.apache.cassandra.stress.util.SampleOfLongs.merge(SampleOfLongs.java:76)
>         at org.apache.cassandra.stress.util.TimingInterval.merge(TimingInterval.java:95)
>         at org.apache.cassandra.stress.util.Timing.snapInterval(Timing.java:95)
>         at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:124)
>         at org.apache.cassandra.stress.StressMetrics.access$200(StressMetrics.java:36)
>         at org.apache.cassandra.stress.StressMetrics$1.run(StressMetrics.java:72)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This is partially down to recently increasing the per-thread sample size, but also because we allocate temporary space linear in size to total sample size in all threads during merge. This can easily be avoided. We should also scale per-thread sample size based on total number of threads, so we limit total memory use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)