You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Terry Cumaranatunge <cu...@gmail.com> on 2011/10/12 03:49:13 UTC

Using ttl to expire columns rather than using delete

Hello,

If you set a ttl and expire a column, I've read that this eventually turns
into a tombstone and will be cleaned out by the GC. Are expirations
considered a form of delete that still requires a node repair to be run in
gc_grace_period seconds? The operations guide says you have to run node
repair if you have deletes, so I'm trying to find out if we can upsert the
column with expirations using a ttl=1 to substitute deletes. The node repair
operations is very intensive in our environment and causes a
significant performance degradation on the system.

Thanks

Re: Using ttl to expire columns rather than using delete

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Wed, Oct 12, 2011 at 3:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Well, the reason you'd want to run repair is to get the tombstone on
> nodes that missed the insert.  And that would only be important if you
> sometimes generate inserts that would be otherwise shadowed by the
> tombstone, right?

The initial question was about "can I use inserting with ttl=1 instead of
issuing deletes", so that would be a case where you do shadow a previous
version with a very small ttl and so repair is important.

But you're right that if you only issue data with expiration (no deletes) and
that you
  * either do not overwrite columns
  * or are sure that when you do overwrite, the value you're overwriting has
     a ttl that is lesser or equal than the ttl of the value you're
overwriting with
     (+gc_grace to be precise)
then yes, repair is not necessary because you can't have shadowed value
resurfacing. And that's the case Eric is talking about btw.

But again, using inserts with tiny ttl in place of tombstones is not
one of those situation, so repair is necessary as usual.

--
Sylvain


>
> On Wed, Oct 12, 2011 at 2:17 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>> Unfortunately, expiring column are no magic bullet. If you insert
>> columns with ttl=1,
>> you're roughly doing the same thing than deleting, so the exact same
>> rule concerning
>> repair applies.
>>
>> What can be said on repair and expiring columns (and that may or may
>> not be helpful)
>> is that if you have a column family on which all and every column you
>> insert has a
>> ttl > n (for some n, including n = infinity) and ttl are your only
>> means of deletion for
>> that CF (i.e, no deletes), then it would be enough to run repair on
>> that column family
>> only every gc_grace + n period of time (instead of every gc_grace period).
>>
>> --
>> Sylvain
>>
>> On Wed, Oct 12, 2011 at 3:49 AM, Terry Cumaranatunge <cu...@gmail.com> wrote:
>>> Hello,
>>>
>>> If you set a ttl and expire a column, I've read that this eventually turns
>>> into a tombstone and will be cleaned out by the GC. Are expirations
>>> considered a form of delete that still requires a node repair to be run in
>>> gc_grace_period seconds? The operations guide says you have to run node
>>> repair if you have deletes, so I'm trying to find out if we can upsert the
>>> column with expirations using a ttl=1 to substitute deletes. The node repair
>>> operations is very intensive in our environment and causes a
>>> significant performance degradation on the system.
>>>
>>> Thanks
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Using ttl to expire columns rather than using delete

Posted by Jonathan Ellis <jb...@gmail.com>.
Well, the reason you'd want to run repair is to get the tombstone on
nodes that missed the insert.  And that would only be important if you
sometimes generate inserts that would be otherwise shadowed by the
tombstone, right?

On Wed, Oct 12, 2011 at 2:17 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> Unfortunately, expiring column are no magic bullet. If you insert
> columns with ttl=1,
> you're roughly doing the same thing than deleting, so the exact same
> rule concerning
> repair applies.
>
> What can be said on repair and expiring columns (and that may or may
> not be helpful)
> is that if you have a column family on which all and every column you
> insert has a
> ttl > n (for some n, including n = infinity) and ttl are your only
> means of deletion for
> that CF (i.e, no deletes), then it would be enough to run repair on
> that column family
> only every gc_grace + n period of time (instead of every gc_grace period).
>
> --
> Sylvain
>
> On Wed, Oct 12, 2011 at 3:49 AM, Terry Cumaranatunge <cu...@gmail.com> wrote:
>> Hello,
>>
>> If you set a ttl and expire a column, I've read that this eventually turns
>> into a tombstone and will be cleaned out by the GC. Are expirations
>> considered a form of delete that still requires a node repair to be run in
>> gc_grace_period seconds? The operations guide says you have to run node
>> repair if you have deletes, so I'm trying to find out if we can upsert the
>> column with expirations using a ttl=1 to substitute deletes. The node repair
>> operations is very intensive in our environment and causes a
>> significant performance degradation on the system.
>>
>> Thanks
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Using ttl to expire columns rather than using delete

Posted by Sylvain Lebresne <sy...@datastax.com>.
Unfortunately, expiring column are no magic bullet. If you insert
columns with ttl=1,
you're roughly doing the same thing than deleting, so the exact same
rule concerning
repair applies.

What can be said on repair and expiring columns (and that may or may
not be helpful)
is that if you have a column family on which all and every column you
insert has a
ttl > n (for some n, including n = infinity) and ttl are your only
means of deletion for
that CF (i.e, no deletes), then it would be enough to run repair on
that column family
only every gc_grace + n period of time (instead of every gc_grace period).

--
Sylvain

On Wed, Oct 12, 2011 at 3:49 AM, Terry Cumaranatunge <cu...@gmail.com> wrote:
> Hello,
>
> If you set a ttl and expire a column, I've read that this eventually turns
> into a tombstone and will be cleaned out by the GC. Are expirations
> considered a form of delete that still requires a node repair to be run in
> gc_grace_period seconds? The operations guide says you have to run node
> repair if you have deletes, so I'm trying to find out if we can upsert the
> column with expirations using a ttl=1 to substitute deletes. The node repair
> operations is very intensive in our environment and causes a
> significant performance degradation on the system.
>
> Thanks

Re: Using ttl to expire columns rather than using delete

Posted by Eric Tamme <et...@gmail.com>.
On 10/11/2011 09:49 PM, Terry Cumaranatunge wrote:
> Hello,
> If you set a ttl and expire a column, I've read that this eventually 
> turns into a tombstone and will be cleaned out by the GC. Are 
> expirations considered a form of delete that still requires a node 
> repair to be run in gc_grace_period seconds? The operations guide says 
> you have to run node repair if you have deletes, so I'm trying to find 
> out if we can upsert the column with expirations using a ttl=1 to 
> substitute deletes. The node repair operations is very intensive in 
> our environment and causes a significant performance degradation on 
> the system.
> Thanks

No - if you only use TTL to expire data, and no actual deletes or 
updates on the ttl, then you generally do not need to do a nodetool repair.

I run two clusters that have rolling data sets relying on TTL that have 
been running for months without any issues and have never run nodetool 
repair.

-Eric