You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alex Petrov (JIRA)" <ji...@apache.org> on 2016/05/18 12:58:12 UTC

[jira] [Updated] (CASSANDRA-9530) SSTable corruption can trigger OOM

     [ https://issues.apache.org/jira/browse/CASSANDRA-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Petrov updated CASSANDRA-9530:
-----------------------------------
    Status: Patch Available  (was: Open)

I've composed a patch that improves situation with detecting corrupted sstables. I had more ways to reproduce corruptions locally, although this one is the most generalised and brute-force one. The idea is to save (backup) first {{n}} bytes and then overwrite {{0}} to {{5}} bytes either randomly or with {{0xFF}}, moving the corruption position byte by byte. 

As the test caught several more cases, where existing asserts were triggered, they were converted to {{CorruptionExceptions}}. I've ran the tests locally a couple of dozen times, they pass all the time (as one of the tests is randomised, it may theoretically land on some value that would discover another edge case, although that seems to be exactly the purpose of such test).

I've also added the mentioned {{max_value_size_in_mb}} to configuration file, also corresponding workflow and the exception.

[trunk|https://github.com/ifesdjeen/cassandra/tree/9530-trunk] |[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-testall/] |[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-9530-trunk-dtest/] |

> SSTable corruption can trigger OOM
> ----------------------------------
>
>                 Key: CASSANDRA-9530
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9530
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Alex Petrov
>
> If a sstable is corrupted so that the length of a given is bogus, we'll still happily try to allocate a buffer of that bogus size to read the value, which can easily lead to an OOM.
> We should probably protect against this. In practice, a given value can be so big since it's limited by the protocol frame size in the first place. Maybe we could add a max_value_size_in_mb setting and we'd considered a sstable corrupted if it was containing a value bigger than that.
> I'll note that this ticket would be a good occasion to improve {{BlacklistingCompactionsTest}}. Typically, it currently generate empty values which makes it pretty much impossible to get the problem described here. And as described in CASSANDRA-9478, it also doesn't test properly for thing like early opening of compaction results. We could try to randomize as much of the parameters of this test as possible to make it more likely to catch any type of corruption that could happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)