You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "T Jake Luciani (Created) (JIRA)" <ji...@apache.org> on 2011/10/31 15:43:32 UTC

[jira] [Created] (CASSANDRA-3428) add constituent tracking to sstables

add constituent tracking to sstables
------------------------------------

                 Key: CASSANDRA-3428
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: T Jake Luciani
             Fix For: 1.1


Compaction merges older sstables into newer versions of the data.

When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.

This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3428) add constituent tracking to sstables

Posted by "T Jake Luciani (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140216#comment-13140216 ] 

T Jake Luciani commented on CASSANDRA-3428:
-------------------------------------------

But you will for incremental snapshots.  How do you know what versions to load of the sstables?  Right now you must load all previous versions.
                
> add constituent tracking to sstables
> ------------------------------------
>
>                 Key: CASSANDRA-3428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: compaction
>             Fix For: 1.1
>
>
> Compaction merges older sstables into newer versions of the data.
> When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.
> This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3428) add constituent tracking to sstables

Posted by "T Jake Luciani (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140247#comment-13140247 ] 

T Jake Luciani commented on CASSANDRA-3428:
-------------------------------------------

bq. the snapshot files plus incrementals from after the last full snapshot (up to point-in-time, if desired) give you exactly what you want, no more, no less.


Maybe I'm thinking about this wrong but If I was going to backup data in cassandra I would never run nodetool snapshot.  I would only enable incremental backup and remote backup the sstable and remove what's been backed up. 
I could then get to any point in time.  

You are saying I should cron snapshot the cluster then keep the incremental between..  I think with the feature I'm suggesting this wouldn't be necessary and IMO be less data to backup in the end.


                
> add constituent tracking to sstables
> ------------------------------------
>
>                 Key: CASSANDRA-3428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: compaction
>             Fix For: 1.1
>
>
> Compaction merges older sstables into newer versions of the data.
> When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.
> This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3428) add constituent tracking to sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140219#comment-13140219 ] 

Jonathan Ellis commented on CASSANDRA-3428:
-------------------------------------------

What is an "incremental snapshot?"  If you're trying to make things more complicated by not linking already-linked sstables in new snapshots, don't.  Hard links are close enough to free as not to matter.
                
> add constituent tracking to sstables
> ------------------------------------
>
>                 Key: CASSANDRA-3428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: compaction
>             Fix For: 1.1
>
>
> Compaction merges older sstables into newer versions of the data.
> When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.
> This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3428) add constituent tracking to sstables

Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-3428.
---------------------------------------

       Resolution: Not A Problem
    Fix Version/s:     (was: 1.1)

bq. You are saying I should cron snapshot the cluster then keep the incremental between

Yes.  This is standard backup procedure.  Doing periodic full snapshots both gives you an upper bound on how many incrementals you have to apply (which, as you point out, can certainly contain information that is obsoleted later) and gives you extra redundancy in case of corruption of one of the incrementals (which otherwise becomes increasingly likely as time goes by).

bq. I think with the feature I'm suggesting this wouldn't be necessary and IMO be less data to backup in the end

I don't think it's worth it.  It would only be useful for the "restore to most recent possible time" and nothing earlier, because otherwise you have the "data mixed in from newer sstables in the compacted version" problem.

Additionally, at least one person has implemented map/reduce against snapshots, which is another point in favor of a "periodic full + incrementals" approach.  (I'll go bug him again about contributing a patch, now that I remember it...)
                
> add constituent tracking to sstables
> ------------------------------------
>
>                 Key: CASSANDRA-3428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: compaction
>
> Compaction merges older sstables into newer versions of the data.
> When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.
> This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3428) add constituent tracking to sstables

Posted by "T Jake Luciani (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140224#comment-13140224 ] 

T Jake Luciani commented on CASSANDRA-3428:
-------------------------------------------

That's not what I'm saying.

When "incremental_backup: true" then sstables are hard linked you end up with a directory full of sstables including ones that have been compacted into newer versions of the data.

If you want to restore from a backup in this scenario you need to load all the sstables then compact.  
If we had constituent data stored in the sstables of what sstables were used to create them then you could programmatically figure out what sstables we need to use to get a complete optimal snapshot.

It would also be handy to track this information anyway in the case of corruption of a sstable you could inspect the meta-data and get the list of sstables to retrieve from backup to fix *just* the corrupt file.
                
> add constituent tracking to sstables
> ------------------------------------
>
>                 Key: CASSANDRA-3428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: compaction
>             Fix For: 1.1
>
>
> Compaction merges older sstables into newer versions of the data.
> When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.
> This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3428) add constituent tracking to sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140209#comment-13140209 ] 

Jonathan Ellis commented on CASSANDRA-3428:
-------------------------------------------

I'm not sure where you're going with this.  The old -> new replace in DataTracker is done atomically; you will never have both old and new sstables present in the same View.
                
> add constituent tracking to sstables
> ------------------------------------
>
>                 Key: CASSANDRA-3428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: compaction
>             Fix For: 1.1
>
>
> Compaction merges older sstables into newer versions of the data.
> When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.
> This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3428) add constituent tracking to sstables

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140236#comment-13140236 ] 

Jonathan Ellis commented on CASSANDRA-3428:
-------------------------------------------

bq. If you want to restore from a backup in this scenario you need to load all the sstables then compact

I'm still confused: the snapshot files plus incrementals from after the last full snapshot (up to point-in-time, if desired) give you exactly what you want, no more, no less.  None of the incrementals can be compacted into sstables in the snapshot because by construction we've said the snapshot is older.  (And if we have a newer snapshot... use that one instead.)

If you're trying to do a "partial" snapshot restore (i.e. not removing all the existing sstable files first) that won't work in the general case because you're unlikely to end up with sstables containing exactly the set of incremental sstables you want with no other data mixed in.
                
> add constituent tracking to sstables
> ------------------------------------
>
>                 Key: CASSANDRA-3428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3428
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: compaction
>             Fix For: 1.1
>
>
> Compaction merges older sstables into newer versions of the data.
> When snapshotting sstables (esp incrementally) it would be very useful to know what older sstables are no longer needed because they are now represented in a newer version.
> This patch should add the list of sstables that made up each new sstable and store this info in the -Statistics file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira