You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aleksey Yeschenko (JIRA)" <ji...@apache.org> on 2015/11/11 18:14:11 UTC

[jira] [Updated] (CASSANDRA-8052) OOMs from allocating large arrays when deserializing (e.g probably corrupted EstimatedHistogram data)

     [ https://issues.apache.org/jira/browse/CASSANDRA-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-8052:
-----------------------------------------
    Fix Version/s: 3.x

> OOMs from allocating large arrays when deserializing (e.g probably corrupted EstimatedHistogram data)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8052
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8052
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: linux
>            Reporter: Matt Byrd
>              Labels: OOM, checksum, corruption, oom, serialization
>             Fix For: 3.x
>
>
> We've seen nodes with what are presumably corrupted sstables repeatedly OOM on attempted startup with such a message:
> {code}
> java.lang.OutOfMemoryError: Java heap space
>  at org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:266) 
> at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:292)
>  at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:282)
>  at org.apache.cassandra.io.sstable.SSTableReader.openMetadata(SSTableReader.java:234)
>  at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:194)
>  at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
>  at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>  at java.lang.Thread.run(Thread.java:722)
> {code}
> It's probably not a coincidence that it's throwing an exception here since this seems to be the first byte of the file read.
> Presumably the correct operational process is just to replace the node, 
> however I was wondering if generally we might want to validate lengths when we deserialise things?
> This could avoid allocating large byte buffers causing unpredictable OOMs and instead throw an exception to be handled as appropriate.
> In this particular instance, there is no need for an unduly large size for the estimated histogram.
> Admittedly things are slightly different in 2.1, though I suspect a similar thing might have happened with:
> {code}
>        int numComponents = in.readInt();
>        // read toc
>        Map<MetadataType, Integer> toc = new HashMap<>(numComponents); 
> {code}
> Doing a find usages of DataInputStream.readInt() reveals quite a few places where an int is read in and then an ArrayList, array or map of that size is created.
> In some cases this size might validly vary over a java int,
> or be in a performance critical or delicate piece of code where one doesn't want such checks.
> Also there are other checksums and mechanisms at play which make some input less likely to be corrupted.
> However, is it maybe worth a pass over instances of this type of input, to try and avoid such cases where it makes sense?
> Perhaps there are less likely but worse failure modes present and hidden? 
> E.g if the deserialisation is happens to be for a message sent to some or all nodes say.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)