You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/11/06 22:10:23 UTC

[jira] Created: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Cassandra cannot detect corrupt-but-readable column data
--------------------------------------------------------

                 Key: CASSANDRA-1717
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Jonathan Ellis
             Fix For: 0.7.1


Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081603#comment-13081603 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

{quote}
bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that).

Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide
{quote}
Let's not:
* this is completely orthogonal to switching to a drop-in, faster, CRC implementation.
* it is unclear we want to make that system-wide. Imho, it is not worth breaking commit log compatibility for that, but it it stupid to commit new code that perpetuate the mistake, especially to change it later.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081074#comment-13081074 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

I don't mind doing the CRC optimization in a separate ticket.  There are other places (CL, others?) that use CRC as well.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079556#comment-13079556 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

If we do that on the column index level won't that imply that we will checksum (and check) a row as a whole instead of a single column?

Can you please describe your idea about doing it at the column index in here just to make sure that we all are on the same page.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079616#comment-13079616 ] 

Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/4/11 9:39 PM:
--------------------------------------------------------------------

This is a good idea but it has few complications:

 - buffer length should be stored in order to be used by reader
 - reads should be aligned by that buffer length so we always read a whole checksummed chunk of the data which implies that we will potentially always need to read more data on each request

This seems to be a clear tradeoff between using additional space to store checksum for index + columns for each row v.s. doing more I/O...


      was (Author: xedin):
    This is a good idea but it has few complications:

 - buffer length should be store in order to be used by reader
 - reads should be aligned by that buffer length so we always read a whole checksummed chunk of the data which implies that we will potentially always need to read more data on each request

This seems to be a clear tradeoff between using additional space to store checksum for index + columns for each row v.s. doing more I/O...

  
> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081629#comment-13081629 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

What are the chance we'll switch from CRC32 any time soon ? And even if we do, why would that help us to save 4 bytes of 0's right now ? We will still have to bump the file format versioning and to keep the code to be compatible with the old CRC32 format if we do so. It's not like the only difference between checksum algorithms is the size of the checksum.

So yes, 4 bytes out of 64K is not a lot of data, but to knowingly write 4 bytes of 0's every 64k every time for the vague remote chance that it may save us 1 or 2 lines of code someday (again, that even remains to be proven) feels ridiculous to me. But if I'm the only one to feel that way, fine, it's not a big deal.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081605#comment-13081605 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

Saving 4 bytes out of 64K doesn't seem like enough benefit to make life harder for ourselves if we want to use a long checksum later.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079937#comment-13079937 ] 

T Jake Luciani commented on CASSANDRA-1717:
-------------------------------------------

The column index level seems like a nice fit to me. it would at least allow partial column scans (in the case of bitrot) 

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Lior Golan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079934#comment-13079934 ] 

Lior Golan edited comment on CASSANDRA-1717 at 8/5/11 12:31 PM:
----------------------------------------------------------------

Seems like in terms of overhead (which based on HADOOP-6148 is potentially very significant in both storage and CPU) - block level checksums is much better.

I understand you believe block level checksums are easy in the compressed case but not easy in the non-compressed case.

So can't you just implement a no-op compression option that will utilize what you're doing / planning to do for compression in terms of block structure and block level checksums?
That would be easy if you already designed the compression algorithm to be plugable. And if the compression algorithm is not plugable yet - adding that would have an obvious side benefit besides having easier implementation of block level checksums.   

      was (Author: liorgo2):
    Seems like in terms of overhead (which based on HADOOP-6148 is potentially very significant in both storage and CPU) - block level checksums is much better.

I understand you believe block level checksums are easy in the compressed case but to not easy in the non-compressed case. So can't you just implement a no-op compression option that will utilize what you're doing for compression in terms of block structure and block level checksums. That would be easy if you already designed for the compression algorithm to be plugable. And if the compression algorithm is not plugable yet - adding that would have an obvious side benefit besides an easier implementation of block level checksums.   
  
> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079382#comment-13079382 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

Making a checksum optional and off by default sounds good to me.

bq. Not sure that's bulletproof...

That is why I mentioned that if we have checksum per column it will work as a protection from wrong decompression on the block level and spares us additional read and check, isn't it?

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081637#comment-13081637 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

I still think that such change is a matter of the separate ticket as we will want to change CRC stuff globally, we can make own Checksum class with will return int value, apply performance improvements mentioned by HADOOP-6148 to it and use system-wide.

Is there anything else that keeps this from being committed?

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Ryan King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079449#comment-13079449 ] 

Ryan King commented on CASSANDRA-1717:
--------------------------------------

I think checksums per column would be way too much overhead. We already add a lot of overhead to all data stored in Cassandra, we should be careful about adding more.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Lior Golan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080035#comment-13080035 ] 

Lior Golan commented on CASSANDRA-1717:
---------------------------------------

If you're afraid of people getting confused with compression options that have nothing with compression, why not give it a more generic name like encoding options. e.g. encoding options = (snappy-with-checksum, checksum-only, none)

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081690#comment-13081690 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

You're right, if we change checksum implementation we need to bump sstable revision anyway.  +1 casting to int here.  (But as you said above, -1 changing this in CommitLog.)

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079984#comment-13079984 ] 

T Jake Luciani commented on CASSANDRA-1717:
-------------------------------------------

Right, I think given that we are using block compression it really only makes sense todo checksums at the block level, I just didn't know what recovery tools we can build.

Sounds like using the row index we could go repair the range containing the bad block(s) from replicas.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079632#comment-13079632 ] 

Todd Lipcon commented on CASSANDRA-1717:
----------------------------------------

xedin on IRC asked me to comment on this issue. For reference of what other systems do: HDFS checksums every file in 512-byte chunks with a CRC32. It's verified on write (by only the first DN in the pipeline) and on read (by the client). If the client gets a checksum error while reading, it will report this to the NN, and the NN will mark that block as corrupt, schedule another replication, etc.

This is all transparent to the HBase layer since it's done at the FS layer. So, HBase itself doesn't do any extra checksumming. If you compress your tables, then you might get an extra layer of checksumming for free from gzip as someone mentioned above.

For some interesting JIRAs on checksum performance, check out HADOOP-6148 and various followups, as well as current work in progress HDFS-2080

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080090#comment-13080090 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

bq. can't you just implement a no-op compression option that will utilize what you're doing / planning to do for compression in terms of block structure and block level checksums? Good question. Pavel?

That sounds like a special-casing and it has complications mentioned before - more I/O, need to hold up buffer size, won't play nice with mmap. Placing it to the block level will harden creation of the tools to process corruption (as Jake mentioned) because we think in the "data model" way not in the file block way.

First all we should define a goal we pursue by this - which is essential.

If this is only about repair and replication I think that the good way will be to checksum at row boundary level which will be: relatively simple to check and play nice with mmap.

I still think that the best way to check for corruption will be to use checksum at row header (key and row index) and column level even if that introduces disk space and CPU overhead (the necessary sacrifice), this could be most elegant solution because of few things where two of them could be: introduces no system wide complexity (aka special-casing) related to how we work with SSTables and repair and allow as think in our data model terms.

But it somehow fills like we are missing better solution in here...



> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081609#comment-13081609 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

+1 with Jonathan, also it is better if we satisfy interface instead of relying on internal implementation details that also could be helpful if we will decide to change checksum algorithm.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080441#comment-13080441 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. In that case can we consider making that compression only feature

For what it's worth, that was basically the idea of having a no-op compression. Basically we only support compression and if people really really want something uncompressed with checksum, we give them a compression algorithm that doesn't compress squat.
Anyway, I'm for "compression only feature" and the no-op compression was just some idea "in case we need it". We don't even have to do it now. But if someone asks, it will be trivial to do.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081076#comment-13081076 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

Oh, true. Make sense to move to a separate ticket then.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080133#comment-13080133 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

bq. can we consider making that compression only feature?

I'm fine with that.  We can always add the checksum-only uncompressed encoding later if someone wants it badly enough. :)

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080120#comment-13080120 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

bq. That sounds like a special-casing

I don't follow.  It feels exactly the opposite of a special case to me: using the per-block code that we already have.

bq. more I/O, need to hold up buffer size, won't play nice with mmap

That's why we give people the choice.  But I'm pretty sure that after 1.0 we'll make compression the default.  So I don't want to add a lot of complexity for uncompressed sstables.

bq. we should define a goal we pursue by this

Here's our requirement:

- prevent corruption from being replicated
- detect and remove corruption on repair 

Nice to have:
- low complexity of implementation
- low space overhead
- detect corruption as soon as it is read

bq. I still think that the best way to check for corruption will be to use checksum at row header (key and row index) and column level

That's not crazy, and it achieves all goals except low space overhead.  But for the reasons above I still think block-level is a better fit.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Lior Golan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079934#comment-13079934 ] 

Lior Golan commented on CASSANDRA-1717:
---------------------------------------

Seems like in terms of overhead (which based on HADOOP-6148 is potentially very significant in both storage and CPU) - block level checksums is much better.

I understand you believe block level checksums are easy in the compressed case but to not easy in the non-compressed case. So can't you just implement a no-op compression option that will utilize what you're doing for compression in terms of block structure and block level checksums. That would be easy if you already designed for the compression algorithm to be plugable. And if the compression algorithm is not plugable yet - adding that would have an obvious side benefit besides an easier implementation of block level checksums.   

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1717:
--------------------------------------

    Fix Version/s:     (was: 0.7.1)
                   0.8
       Issue Type: New Feature  (was: Bug)

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079953#comment-13079953 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. The column index level seems like a nice fit to me.

Right, I kind of forgot this one previously, it is different from row level and actually better (you will check on each read). However, it has a fair overhead in terms of cpu and storage (less than column level, but much that we would want imho). I still believe block level is the right level, if not for the mmap problem. 

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081569#comment-13081569 ] 

Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:29 AM:
---------------------------------------------------------------------

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else.

  It checksums original (non-compressed) data and stores checksum at the end of the compressed chunk, reader makes a checksum check after decompression.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

      was (Author: xedin):
    bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)
  
> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079293#comment-13079293 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

After thinking about this for a while I think we should do checksum at the column level only which will give us better control over individual columns and does not seem to be a big overhead (instead of doing it at the column index level and relaying on digest). Checksum on the compressed block level is unnecessary because bitrot, for example, will be detected right on decompression or column deserialization time. Thoughts?

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081773#comment-13081773 ] 

Hudson commented on CASSANDRA-1717:
-----------------------------------

Integrated in Cassandra #1010 (See [https://builds.apache.org/job/Cassandra/1010/])
    Add block level checksum for compressed data
patch by Pavel Yaskevich; reviewed by Sylvain Lebresne for CASSANDRA-1717

xedin : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1155420
Files : 
* /cassandra/trunk/test/unit/org/apache/cassandra/Util.java
* /cassandra/trunk/test/unit/org/apache/cassandra/io/compress/CompressedRandomAccessReaderTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/compress/CorruptedBlockException.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/io/compress/CompressedRandomAccessReader.java
* /cassandra/trunk/test/unit/org/apache/cassandra/io/util/BufferedRandomAccessFileTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/compress/CompressionMetadata.java
* /cassandra/trunk/src/java/org/apache/cassandra/utils/FBUtilities.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java


> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717-v3.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079900#comment-13079900 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

My 2 cents:

I see 3 options that seems to make sense somehow:
# checksums at the column level:
  ** pros: easy to do, easy to recover from a bitrot and efficiently so (efficiently in that in general we would be able to only drop one column for a given bitrot; it's more complicated if something in the row header (row key, row size, ...) is bitrotten though).
  ** cons: high overhead (mainly in disk space usage but also on cpu usage because we have much more checksums to check)
# checksums at the row level (or column index level, but I think this is essentially the same, isn't it?):
  ** pros: easy to recover from bitrot (we drop the row), though potentially more wasteful than "column level". Incurs a small space overhead for big rows.
  ** cons: can't realistically check on every reads, so we need to do it only on compaction/repair and on read digest mismatch (that last one is non optional if we want checksums to be sure in that bitrot never propagate to other node); this adds complexity and some I/O to check checksums on read digest mismatch that is not necessary (read digest mismatch won't in general be due to bitrot). Also incurs a important space overhead for tiny rows.
# checksums at the block level:
  ** pros: super easy in the compressed case (can be done "on every read", or more precisely each time we read a block). Incurs a minimum overhead.
  ** cons: super *not* easy in the non-compressed case. We don't have blocks in the uncompressed case. While writing, we could use the buffer size as a block size and add a checksum on flush. The problems are on reads however.  First, we would need to align buffers on reads (which we don't do in the non-compressed case) as Pavel said, which likely involves more reBuffer in general (aka more I/O). But perhaps more importantly, I have no clue how you could make that work with mmap efficiently (we would potentially have a checksum in the middle of a column value as far as mmap is concerned).  Also slightly harder to recover from bitrot without dropping the whole sstable (but doable as long as we have the index around).

There may be other solutions I don't see, and there may be some pros/cons for the ones above that I have missed (please feel free to complete).

But based on those, my personal opinion is that "column level" has too big an overhead and "block level" is really problematic in the mmap non-compressed case (but sound like the best option to me if we ignore mmap).

So my personal preference leans towards using "block level" but only having checksums in the compressed case and maybe in an uncompressed mode for which mmap would be deactivated.

If we really don't want to consider that, "row level" checksums would maybe be the lesser evil. But I'm not fond of the overhead in case of tiny rows and the 'check checksums on read digest mismatch', while I believe necessary in that case, doesn't sound like the best idea ever.


> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080027#comment-13080027 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

I like compression options {snappy, checksum-only, none}.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1717:
--------------------------------------

    Attachment: checksums.txt

Naive proof of concept to checksum at the column level.  If this is too much overhead we can checksum at the column index block instead, and check that on digest mismatch (so we don't have to deserialize the entire block for each read).

Otherwise, this needs to be extended to (a) cover supercolumns and (b) maintain backwards compatibility w/ old data files.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.7.1
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079608#comment-13079608 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

what about this?

- add checksum-on-flush to SequentialWriter
- compressed reads will always check on uncompress
- uncompressed reads will check on repair
- and also on read repair digest mismatch retry


> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001813#comment-13001813 ] 

Stu Hood commented on CASSANDRA-1717:
-------------------------------------

Sorry, that was supposed to say #674. We've been dealing with compaction and garbage collection issues, and I haven't had as much time to work on it recently. I'm hoping to be able to resume #674 in the next few weeks, but I don't think it will be 0.8 material.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081718#comment-13081718 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

lgtm, +1

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717-v3.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079951#comment-13079951 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. So can't you just implement a no-op compression option

That is exactly what I had in mind for "maybe in an uncompressed mode for which mmap would be deactivated". It will be trivial once we have made it easy to switch compression algorithm (which is trivial too btw, I'll do that probably ... well why not now).

That would make things fairly clear imho. We would say "no mmap" with compression and no checksum without compression, but you have the "compression with no-op algorithm". This has my preference as said previously, but we must be aware that people will ask why we have a non-mmap non compressed mode and a compressed no-op mode (we cannot really get rid of the first one because otherwise we say "you shall use mmap forever now").

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079424#comment-13079424 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

My bad, I read that as "let's not use checksum for compression at all". Nevermind.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079946#comment-13079946 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

bq. can't you just implement a no-op compression option that will utilize what you're doing / planning to do for compression in terms of block structure and block level checksums?

Good question.  Pavel?

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-1717:
---------------------------------------

    Attachment: CASSANDRA-1717-v3.patch

v3 which removes BBU.toLong and adds FBU.byteArrayToInt + uses int instead of long for checksum

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717-v3.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079616#comment-13079616 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

This is a good idea but it has few complications:

 - buffer length should be store in order to be used by reader
 - reads should be aligned by that buffer length so we always read a whole checksummed chunk of the data which implies that we will potentially always need to read more data on each request

This seems to be a clear tradeoff between using additional space to store checksum for index + columns for each row v.s. doing more I/O...


> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079635#comment-13079635 ] 

Todd Lipcon commented on CASSANDRA-1717:
----------------------------------------

BTW, we also scan all blocks verifying checksums in a background process, continuously, to catch bit-rot even for data that isn't getting read.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081569#comment-13081569 ] 

Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:25 AM:
---------------------------------------------------------------------

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

      was (Author: xedin):
    bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)
  
> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Benjamin Coverston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001649#comment-13001649 ] 

Benjamin Coverston edited comment on CASSANDRA-1717 at 3/3/11 12:13 AM:
------------------------------------------------------------------------

Would we really save that much by waiting until #16 is done? Perhaps we should take a shot at this in 0.7. Right now it's possible to have data corrupted, then replicated leading to loss of data.

Edit: Apparently 16 is already done.

      was (Author: bcoverston):
    Would we really save that much by waiting until #16 is done? Perhaps we should take a shot at this in 0.7. Right now it's possible to have data corrupted, then replicated leading to loss of data.
  
> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079479#comment-13079479 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

That is why we want to let users to decide if they need that protection at all.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-1717:
---------------------------------------

    Attachment: CASSANDRA-1717.patch

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079979#comment-13079979 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. Block level and column index level are actually the same right? 64kb

True, but that only covers the big indexed rows. If you have lots of tiny rows, you have a much bigger overhead. Block level is more consistent and predictable. And with column index level, you also need to checksum the row header, so it's a slightly greater overhead anyway even for big rows. It's also a bit more complicated conceptually, you need to checksum row header and body separately and distinguish between indexed and non-indexed rows.

bq. The reason block isn't ideal to me is it makes it much harder to recover/support partial reads since the block has no context in the file format

I agree as I mentioned earlier that it is harder. I don't know about the much (at least for recovery) however. With the row index, I'm sure it's not too hard to only drop the block and maybe a little bit around to get something consistent. Yes, it means we will always drop more than with column index level, but imho it is not like bitrot happens so often that it matters much (but I understand one could disagree).
Also, with column index, you can still have bitrot of the row header, in which case the whole row is still screwed.

Anyway, don't get me wrong, I'm not saying that column index is a stupid idea. I think however that for some (non exceptional( use cases (small rows, aka, probably most of the 'static CF'), the overhead will be much more important than with block level. I also think block level is cleaner in that you don't have to care about different cases. On the other side, the advantages of the column index level are only useful in the exceptional case of bitrot (not the case we should optimize for imho), and it is more efficient then but not so much.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929223#action_12929223 ] 

Stu Hood commented on CASSANDRA-1717:
-------------------------------------

I think we should consider delaying this until #16 is fixed (hopefully in 0.8): adding compression will require a block based format, which is a natural level to checksum at. Additionally, if a user wanted to force corruption detection per lookup (as opposed to only when the entire block is read) GZIPs built checksumming kills two birds with one stone.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.7.1
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Updated] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-1717:
---------------------------------------

    Attachment: CASSANDRA-1717-v2.patch

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw).

  Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that).

  Lets leave that to the ticket for CRC optimization which will allow us to modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else.

  Checksum is moved to the original data.
 
bq. Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time).

  Test was added to CompressedRandomAccessReaderTest.

As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly.

  As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080126#comment-13080126 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

By special-casing i mean the following:

bq. We would say "no mmap" with compression and no checksum without compression, but you have the "compression with no-op algorithm". This has my preference as said previously, but we must be aware that people will ask why we have a non-mmap non compressed mode and a compressed no-op mode (we cannot really get rid of the first one because otherwise we say "you shall use mmap forever now").

bq. But I'm pretty sure that after 1.0 we'll make compression the default. So I don't want to add a lot of complexity for uncompressed sstables.

In that case can we consider making that compression only feature instead of banging our heads against the wall trying to come up with solution for non-compressed data?

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081072#comment-13081072 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

Comments:
* CSW.flushData() forgot to reset the checksum (this is caught by the unit tests btw).
* We should convert the CRC32 to an int (and only write that) as it is an int internally (getValue() returns a long only because CRC32 implements the interface Checksum that require that).
* Here we checksum the compressed data. The other approach would be to checksum the uncompressed data. The advantage of checksumming compressed data is the speed (less data to checksum), but checksumming the uncompressed data would be a little bit safer. In particular, it would prevent us from messing up in the decompression (and we don't have to trust the compression algorithm, not that I don't trust Snappy, but...). This is a clearly a trade-off that we have to make, but I admit that my personal preference would lean towards safety (in particular, I know that checksumming the uncompressed data give a bit more safety, I don't know what is our exact gain quantitatively with checksumming compressed data). On the other side, checksumming the uncompressed data would likely mean that a good part of the bitrot would result in a decompression error rather than a checksum error, which is maybe less convenient from the implementation point of view. So I don't know, I guess I'm thinking aloud to have other's opinions more than anything else.
* Let's add some unit tests. At least it's relatively easy to write a few blocks, switch one bit in the resulting file, and checking this is caught at read time (or better, do that multiple time changing a different bit each time).
* As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of java CRC32. In particular, it seems they have been able to close to double the speed of the CRC32, with a solution that seems fairly simple to me. It would be ok to use java native CRC32 and leave the improvement to another ticket, but quite frankly if it is that simple and since the hadoop guys have done all the hard work for us, I say we start with the efficient version directly.


> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-1717:
-----------------------------------------

    Assignee: Pavel Yaskevich

Should probably do this (either column-level or block-level) at the same time as CASSANDRA-47.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081685#comment-13081685 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

As previously said, I both disagree on using 8 bytes when we need 4 and that using 4 is a matter for another ticket, but since this is probably me being too anal as usual, +1 on the rest of the patch, modulo a small optional nitpick: the toLong() function is a bit hard to read imho. It's hard to see where the parenthesis are, and if it does the right thing. It seems ok though, I just think a simple for loop on the bytes would be more readable. We also historically keep ByteBufferUtil for ByteBuffer manipulations and use FBUtilities for byte[] manipulation.


> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081689#comment-13081689 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

Ok, I will move toLong(byte[] bytes) to FBUtilities and commit, thanks!

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079314#comment-13079314 ] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. checksum at the column level only which will give us better control over individual columns and does not seem to be a big overhead

I agree that it is by far the simplest approach for non compressed data, but I, for one, am a bit concerned by the overhead: 4 bytes per column is not negligible. On some load, that could easily mean a 10-20% data size increase. Basically I am concerned about people upgrading to 1.0 and want to make sure that upgrading brings no surprise for them (and this even if they don't "trust" compression yet, which would be perfectly reasonable). For that to be true, I think that if we go with checksum at the column level we would need to make that optional and off by default.

bq. Checksum on the compressed block level is unnecessary because bitrot, for example, will be detected right on decompression

Not sure that's bulletproof. I don't think all compression algorithm ships with a checksum (I don't know about snappy typically). When they don't, it's totally possible for bitrot to corrupt compressed data without being a problem at decompression nor at deserialization if you're unlucky (granted it is more unlikely to go undetected that without compression but it is not good enough). So either we check that snappy use checksumming and we only add support for algorithm that does, or it is still useful. 

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079561#comment-13079561 ] 

Stu Hood commented on CASSANDRA-1717:
-------------------------------------

You should also consider that checksumming at the column index or column level will require separate checksums for the column index and row header. Checksumming at the block level gets you that in one go.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Benjamin Coverston (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001649#comment-13001649 ] 

Benjamin Coverston commented on CASSANDRA-1717:
-----------------------------------------------

Would we really save that much by waiting until #16 is done? Perhaps we should take a shot at this in 0.7. Right now it's possible to have data corrupted, then replicated leading to loss of data.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 0.8
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080139#comment-13080139 ] 

Pavel Yaskevich commented on CASSANDRA-1717:
--------------------------------------------

Let me prepare a patch for that and meanwhile wait if someone disagrees :)

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079960#comment-13079960 ] 

T Jake Luciani commented on CASSANDRA-1717:
-------------------------------------------

Block level and column index level are actually the same right? 64kb

The reason block isn't ideal to me is it makes it much harder to recover/support partial reads since the block has no context in the file format.  Though if there is corruption with block level compression then it's inherently a block level problem :)

So what kind of recovery can we support? Can we ever recover from bad blocks or just throw errors "bad blocks found, manual repair required "?

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079541#comment-13079541 ] 

Jonathan Ellis commented on CASSANDRA-1717:
-------------------------------------------

I'd rather not add more configuration complexity for this.

What is the downside to doing it at column index level?  Feels like a good compromise between overhead and granularity to me.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-1717:
---------------------------------------

    Reviewer: slebresne  (was: jbellis)

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-1717.patch, checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

Posted by "Ryan King (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052765#comment-13052765 ] 

Ryan King commented on CASSANDRA-1717:
--------------------------------------

I know I'm starting to sound like a broken record, but CASSANDRA-674 is going to include checksums. And its almost ready for reviewing.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable, so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column data readable we do not detect it, and if it corrupts to a higher timestamp value can even resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira