You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Eduardo Cusa <ed...@usmediaconsulting.com> on 2015/01/05 19:23:37 UTC

ttl in collections

Hi guys, I have to work with the following model:

userid : text
categories: [3, 4, 55, 623, ...........]

in my use case, the list of values is updated every day, with 100 millons
of users and  a total of 500 categories at most.


There is a way to assign a TT to each item in the category list?


Regards
Eduardo

Re: ttl in collections

Posted by Eduardo Cusa <ed...@usmediaconsulting.com>.
thanks Jens and Ryan, is clear to me what happens with tombstones for a CF
row

Now, the same behavior that apply to CF rows also apply to elements in a
set Data type?


Regards

On Tue, Jan 6, 2015 at 12:31 PM, Ryan Svihla <rs...@foundev.pro> wrote:

> Tombstone management is a big conversation, you can manage it in one of
> the following ways
>
> 1) set a gc_grace_seconds of 0 and then run nodetool compact while using
> size tiered compaction..as frequently as needed. This often is a pretty
> lousy solution as gc_grace_seconds means you're not very partition tolerant
> and it's easy to bring data back from the dead if you don't manage how you
> bring nodes back online correctly. Also..nodetool compact is super
> intensive. I don't recommend this approach unless you're already very
> operationally sound.
> 2)Partition your data using a scheme that matches your domain model. It
> sounds like you're using a queue approach and by and large  a distributed
> database that relies on tombstones is going to struggle with that by
> default. I have however, worked with a number of customers that use
> cassandra for a queue at scale and I detailed the modeling workarounds here
> http://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/
>
> On Tue, Jan 6, 2015 at 4:24 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
>
>> Hi Eduardo,
>>
>> Zitat von Eduardo Cusa <ed...@usmediaconsulting.com>:
>>
>>>  [...]
>>> I have to worry about the tombstones generated?  Considering that I will
>>> have many daily set updates
>>>
>>
>> that depends on your definition of "many"... we've run into a situation
>> where we wanted to age out old data using TTL... unfortunately, we ran into
>> the "tombstone_failure_threshold" limit rather quickly, having thousands of
>> record updates per second. That left us with a CF containing millions of
>> records that we couldn't "select" the way we originally intended.
>>
>> Regards,
>> Jens
>>
>>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>

Re: ttl in collections

Posted by Ryan Svihla <rs...@foundev.pro>.
Tombstone management is a big conversation, you can manage it in one of the
following ways

1) set a gc_grace_seconds of 0 and then run nodetool compact while using
size tiered compaction..as frequently as needed. This often is a pretty
lousy solution as gc_grace_seconds means you're not very partition tolerant
and it's easy to bring data back from the dead if you don't manage how you
bring nodes back online correctly. Also..nodetool compact is super
intensive. I don't recommend this approach unless you're already very
operationally sound.
2)Partition your data using a scheme that matches your domain model. It
sounds like you're using a queue approach and by and large  a distributed
database that relies on tombstones is going to struggle with that by
default. I have however, worked with a number of customers that use
cassandra for a queue at scale and I detailed the modeling workarounds here
http://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/

On Tue, Jan 6, 2015 at 4:24 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:

> Hi Eduardo,
>
> Zitat von Eduardo Cusa <ed...@usmediaconsulting.com>:
>
>>  [...]
>> I have to worry about the tombstones generated?  Considering that I will
>> have many daily set updates
>>
>
> that depends on your definition of "many"... we've run into a situation
> where we wanted to age out old data using TTL... unfortunately, we ran into
> the "tombstone_failure_threshold" limit rather quickly, having thousands of
> record updates per second. That left us with a CF containing millions of
> records that we couldn't "select" the way we originally intended.
>
> Regards,
> Jens
>
>


-- 

Thanks,
Ryan Svihla

Re: ttl in collections

Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Eduardo,

Zitat von Eduardo Cusa <ed...@usmediaconsulting.com>:
>  [...]
> I have to worry about the tombstones generated?  Considering that I will
> have many daily set updates

that depends on your definition of "many"... we've run into a  
situation where we wanted to age out old data using TTL...  
unfortunately, we ran into the "tombstone_failure_threshold" limit  
rather quickly, having thousands of record updates per second. That  
left us with a CF containing millions of records that we couldn't  
"select" the way we originally intended.

Regards,
Jens


Re: ttl in collections

Posted by Eduardo Cusa <ed...@usmediaconsulting.com>.
Hi, using the following updates i made expire the direfent values in
deferent times:

update categories_sync  using ttl 60 set category = category + {'2'}  where
userid = 'u1';
update categories_sync  using ttl 120 set category = category + {'3'}
 where userid = 'u1';
update categories_sync  using ttl 180 set category = category + {'4'}
 where userid = 'u1';


I have to worry about the tombstones generated?  Considering that I will
have many daily set updates

Regards






On Mon, Jan 5, 2015 at 3:23 PM, Eduardo Cusa <
eduardo.cusa@usmediaconsulting.com> wrote:

> Hi guys, I have to work with the following model:
>
> userid : text
> categories: [3, 4, 55, 623, ...........]
>
> in my use case, the list of values is updated every day, with 100 millons
> of users and  a total of 500 categories at most.
>
>
> There is a way to assign a TT to each item in the category list?
>
>
> Regards
> Eduardo
>
>
>
>
>
>
>
>