You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Created) (JIRA)" <ji...@apache.org> on 2011/11/04 19:53:51 UTC

[jira] [Created] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Automatically create SHA1 of new sstables
-----------------------------------------

                 Key: CASSANDRA-3456
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Minor


Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.

I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3456:
--------------------------------------

    Attachment: system.log

Got the attached errors when starting my test node (with pre-existing, non-sha'd data).  It looks like there may be problems creating new files as well as with old ones.
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.3
>
>         Attachments: 3456.patch, system.log
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-3456.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.3
         Reviewer: jbellis

Committed
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.3
>
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146534#comment-13146534 ] 

Sylvain Lebresne commented on CASSANDRA-3456:
---------------------------------------------

Patch attached. It creates a new component, say Standard1-h-16-Digest.sha1, that is can be directly checked by sha1sum ({{ sha1sum -c Standard1-h-16-Digest.sha1 }}).

Not that the sha1 is only created for non-compressed files since compressed files have more fine-grained checksums, but we could change it easily. I also didn't added anything in Cassandra itself that check the sha1 sum. We can add it to say scrub, but we would have to read entirely the file before the scrub to compute the sha1, and I don't see the point of adding that time to scrub (at least until CASSANDRA-3406 while scrub is used for much more that corruption detection/correction). Besides, sha1sum does that well already.
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147999#comment-13147999 ] 

Jonathan Ellis commented on CASSANDRA-3456:
-------------------------------------------

+1
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.3
>
>         Attachments: 0001-Fix-pattern-under-window.patch, 3456.patch, system.log
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146547#comment-13146547 ] 

Jonathan Ellis commented on CASSANDRA-3456:
-------------------------------------------

Doh.  +1, then.
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3456:
----------------------------------------

    Attachment: 0001-Fix-pattern-under-window.patch

Note sure about the deletion problem but it's likely related to the other exception. I was using a split(File.separator), which won't work well on windows. Attaching patch to use Pattern.quote.
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.3
>
>         Attachments: 0001-Fix-pattern-under-window.patch, 3456.patch, system.log
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146546#comment-13146546 ] 

Sylvain Lebresne commented on CASSANDRA-3456:
---------------------------------------------

Hum, apparently that Guava thing is not there yet (and doesn't seem it will be added): http://code.google.com/p/guava-libraries/issues/detail?id=758

So if you strongly feel the digest should be created in the constructor, I'll update but if there is nothing else, I'll just update the duplicated comment line while committing.
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Jonathan Ellis (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reopened CASSANDRA-3456:
---------------------------------------

    
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.3
>
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146530#comment-13146530 ] 

Jonathan Ellis commented on CASSANDRA-3456:
-------------------------------------------

Some nits:

- duplicated comment line:
{{ +        // a bitmap secondary index: many of these may exist per sstable}}
- Guava provides MessageDigestAlgorithm so you don't have to do the try/catch business for built-ins
- "This can only be called before any data is written to this write" sounds like "this should be a constructor parameter" to me
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146346#comment-13146346 ] 

Jonathan Ellis commented on CASSANDRA-3456:
-------------------------------------------

Okay, I'm in for a separate component.
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144295#comment-13144295 ] 

Jonathan Ellis commented on CASSANDRA-3456:
-------------------------------------------

I'm a little torn about adding another component -- leveled compaction will eat up fds really quickly and adding more components will make that worse.  Putting it in the metadata/statistics component is *almost* as user friendly (scrub can check it, or we can provide a standalone sstablesha tool to extract it).  What do you think?
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-3456.
-----------------------------------------

    Resolution: Fixed

Fix committed
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.0.3
>
>         Attachments: 0001-Fix-pattern-under-window.patch, 3456.patch, system.log
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146536#comment-13146536 ] 

Sylvain Lebresne commented on CASSANDRA-3456:
---------------------------------------------

That was fast :)

bq. Guava provides MessageDigestAlgorithm so you don't have to do the try/catch business for built-ins

Nice, I'll update

bq. "This can only be called before any data is written to this write" sounds like "this should be a constructor parameter" to me

It does, but given that we don't want to compute a digest for say the commit log or other uses of SequentialWriter, pushing this in the constructor required creating a bunch of new 'constructors' and/or have the creation SequentialWriter takes one more boolean flag. I started with that, but it didn't felt so beautiful.
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146317#comment-13146317 ] 

Sylvain Lebresne commented on CASSANDRA-3456:
---------------------------------------------

I don't think the fd thing is a real problem, because we won't need to open this component very much, and even if we do it for scrub, we will open-read-close very quickly. I would agree though that having too many components can get annoying. But that being said, I kind of think this is one where having it separate does make sense for the purpose of external checking. It is true we can have a simple external tool, but that add something to maintain that we could avoid. So I guess I'm also a little torn though I'm leaning toward 'having a separate component is the better choice' (but putting it in -Statistics is not much more work so if there is preference for that, so be it).
                
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3456) Automatically create SHA1 of new sstables

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3456:
----------------------------------------

    Attachment: 3456.patch
    
> Automatically create SHA1 of new sstables
> -----------------------------------------
>
>                 Key: CASSANDRA-3456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3456
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>         Attachments: 3456.patch
>
>
> Compressed sstables have block checksums which is great but non-compressed sstables don't for technical/compatibility reasons that I'm not criticizing. It's a bit annoying because when someone comes up with a corrupted file, we really have nothing to help discarding it as bitrot or not. However, it would be fairly trivial/cheap to compute the SHA1 (or other) of whole sstables when creating them. And if it's a new, separate, sstable component, we don't even have to implement anything to check the hash. It would only be there to (manually) check for bitrot when corruption is suspected by the user, or to say check the integrity of backups.
> I'm absolutely not pretending that it's a perfect solution, and for compressed sstables the block checksums are clearly more fine grained, but it's easy to add and could prove useful for non compressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira