You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/05/07 17:21:18 UTC

[jira] [Updated] (CASSANDRA-5546) Gc_grace should start at the creation of the column, not when it expires

     [ https://issues.apache.org/jira/browse/CASSANDRA-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-5546:
----------------------------------------

    Description: 
Currently, gc_grace determines "the minimum time we keep a column that has been marked for deletion", where "marked for deletion" is creation time for a DeletedColumn or the expiration time for an ExpiringColumn.

However, in the case of expiring columns, if you want to optimize deletions while making sure you don't resurrect overwritten data, you only care about keeping expired columns gc_grace seconds *since their creation time*, not *since their expiration time*. It would thus be better to have gc_grace be "the minimum time we keep a column since it's creation" (which would change nothing for tombstones, but for TTL would basically ensure we remove the expiration time from the time we keep the column once expired).

To sum it up, this would have the following advantages:
# This will make fine tuning of gc_grace a little less of a black art.
# This will be more efficient for CF mixing deletes and expiring columns (we'll remove tombstones for the expiring one sooner).
# This means gc_grace will be more reliable for things like CASSANDRA-5314.

Doing this is pretty simple. The one concern is backward compatilibity: it means people that have fine tuned gc_grace to a very low value because they knew it was ok due to their systematic use of ttls might have to update it back to a bigger, more reasonable value before updates.


  was:
Currently, gc_grace determines "the minimum time we keep a column that has been marked for deletion", where "marked for deletion" is creation time for a DeletedColumn or the expiration time for an ExpiringColumn.

However, in the case of expiring columns, if you want to optimize deletions while making sure you don't resurrect overwritten data, you only care about keeping expired columns gc_grace seconds *since their creation time*, not *since their expiration time*. It would thus be better to have gc_grace be "the minimum time we keep a column since it's creation" (which would change anything for tombstones, but for TTL would basically ensure we remove the expiration time from the time we keep the column once expired).

To sum it up, this would have the following advantages:
# This will make fine tuning of gc_grace a little less of a black art.
# This will be more efficient for CF mixing deletes and expiring columns (we'll remove tombstones for the expiring one sooner).
# This means gc_grace will be more reliable for things like CASSANDRA-5314.

Doing this is pretty simple. The one concern is backward compatilibity: it means people that have fine tuned gc_grace to a very low value because they knew it was ok due to their systematic use of ttls might have to update it back to a bigger, more reasonable value before updates.


    
> Gc_grace should start at the creation of the column, not when it expires
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5546
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>             Fix For: 2.0
>
>
> Currently, gc_grace determines "the minimum time we keep a column that has been marked for deletion", where "marked for deletion" is creation time for a DeletedColumn or the expiration time for an ExpiringColumn.
> However, in the case of expiring columns, if you want to optimize deletions while making sure you don't resurrect overwritten data, you only care about keeping expired columns gc_grace seconds *since their creation time*, not *since their expiration time*. It would thus be better to have gc_grace be "the minimum time we keep a column since it's creation" (which would change nothing for tombstones, but for TTL would basically ensure we remove the expiration time from the time we keep the column once expired).
> To sum it up, this would have the following advantages:
> # This will make fine tuning of gc_grace a little less of a black art.
> # This will be more efficient for CF mixing deletes and expiring columns (we'll remove tombstones for the expiring one sooner).
> # This means gc_grace will be more reliable for things like CASSANDRA-5314.
> Doing this is pretty simple. The one concern is backward compatilibity: it means people that have fine tuned gc_grace to a very low value because they knew it was ok due to their systematic use of ttls might have to update it back to a bigger, more reasonable value before updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira