You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Created) (JIRA)" <ji...@apache.org> on 2011/10/26 18:39:32 UTC

[jira] [Created] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
-----------------------------------------------------------------------------------------

                 Key: CASSANDRA-3406
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
    Affects Versions: 1.0.0
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Trivial
             Fix For: 1.0.2


Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
* major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
* to force the compaction of existing sstables after changing the compression algorithm

We should probably provide a separate tool/command for those two last tasks since:
* we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
* we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
* make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151317#comment-13151317 ] 

Jonathan Ellis commented on CASSANDRA-3406:
-------------------------------------------

Honestly I'm not finding the benefits here very compelling in exchange for the additional complexity.  We're still doing the same amount of i/o, and 99% as much CPU (most of the scrub "corruption detection" only kicks in if there's an exception trying to rewrite).
                
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 3406.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3406:
----------------------------------------

    Attachment: 3406.patch
    
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 3406.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151352#comment-13151352 ] 

Jonathan Ellis commented on CASSANDRA-3406:
-------------------------------------------

Can you split it into refactor + newscrub patches?

(Incidently I think having snapshot on the upgrade path is a Very Good Thing Indeed, although scrub isn't quite the best way to do that.)
                
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 3406.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151359#comment-13151359 ] 

Sylvain Lebresne commented on CASSANDRA-3406:
---------------------------------------------

bq. Can you split it into refactor + newscrub patches?

Will do.

bq. (Incidently I think having snapshot on the upgrade path is a Very Good Thing Indeed, although scrub isn't quite the best way to do that.)

I couldn't agree more but it must be before the upgrade, scrub is run after, so not the good place at all.
                
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 3406.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151506#comment-13151506 ] 

Jonathan Ellis commented on CASSANDRA-3406:
-------------------------------------------

+1
                
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 0001-Refactor-to-avoid-code-duplication.patch, 0002-Add-upgradesstables-command.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151348#comment-13151348 ] 

Sylvain Lebresne commented on CASSANDRA-3406:
---------------------------------------------

Parts of the patch is just refactoring slightly CompactionManager to avoid some existing code duplication between performCleanup and performScrub. I don't claim such refactoring is a priority or anything, but I would venture that this is a good thing in itself. Once that refactoring is done, the new operation is literally 4 lines. Then there is the cruft to make it callable from nodetool, but overall it doesn't sound like much complexity to me.

Now for the benefits, it is clearly *not* for saving i/o or CPU. The goal is:
* to avoid having an operation called 'scrub' part of the normal upgrade path because it's a scary name. Yes, it's just a naming thing (but names are important) and yes nobody came complaining about that name but let's be honest, scrub was not created for the action of rewriting sstables post-upgrade and the name is not adapted.
* scrub does an automatic snapshot. It's totally reasonable for scrub initial purpose given the fact it can discard data (albeit corrupted ones), but it's just annoying when you've already snapshotted (and maybe move the snapshot in some safe place) everything just before your upgrade because you're a good guy.
* scrub can discard data. I think this is something that should never go unnoticed. By pushing the use of scrub for case where there is absolutely no reason to suspect corruption, it makes it more likely to have it be unnoticed, at least at first.

So yes, all of this is mostly details, and sorry to be so verbose for such a minor issue but I happen to think that such details are important and that this ticket would be an improvement.
                
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 3406.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3406:
----------------------------------------

    Attachment:     (was: 3406.patch)
    
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 0001-Refactor-to-avoid-code-duplication.patch, 0002-Add-upgradesstables-command.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3406) Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3406:
----------------------------------------

    Attachment: 0002-Add-upgradesstables-command.patch
                0001-Refactor-to-avoid-code-duplication.patch

Patches attached with the refactor in its own patch.
                
> Create a nodetool upgrade_sstables to avoid using scrubs for tasks it wasn't intended to.
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3406
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3406
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Trivial
>             Fix For: 1.0.4
>
>         Attachments: 0001-Refactor-to-avoid-code-duplication.patch, 0002-Add-upgradesstables-command.patch
>
>
> Scrub was intended to check a data file is not corrupted and to try to correct some form of corruption and discard the data when it can't repair. But we are now using it also for:
> * major upgrade, to have sstable in the new data format for streaming sake (that one could be "fixed" independently by supporting old format during streaming)
> * to force the compaction of existing sstables after changing the compression algorithm
> We should probably provide a separate tool/command for those two last tasks since:
> * we could have a better name, like upgrade_sstables or rewrite_sstables for that operation
> * we could avoid the automatic snapshot that scrub does (and is not expected by users for those operations)
> * make it slightly quicker/simpler by avoiding the corruption detection code

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira