You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Eduardo Cusa <ed...@usmediaconsulting.com> on 2015/01/05 19:23:37 UTC
ttl in collections
Hi guys, I have to work with the following model:
userid : text
categories: [3, 4, 55, 623, ...........]
in my use case, the list of values is updated every day, with 100 millons
of users and a total of 500 categories at most.
There is a way to assign a TT to each item in the category list?
Regards
Eduardo
Re: ttl in collections
Posted by Eduardo Cusa <ed...@usmediaconsulting.com>.
thanks Jens and Ryan, is clear to me what happens with tombstones for a CF
row
Now, the same behavior that apply to CF rows also apply to elements in a
set Data type?
Regards
On Tue, Jan 6, 2015 at 12:31 PM, Ryan Svihla <rs...@foundev.pro> wrote:
> Tombstone management is a big conversation, you can manage it in one of
> the following ways
>
> 1) set a gc_grace_seconds of 0 and then run nodetool compact while using
> size tiered compaction..as frequently as needed. This often is a pretty
> lousy solution as gc_grace_seconds means you're not very partition tolerant
> and it's easy to bring data back from the dead if you don't manage how you
> bring nodes back online correctly. Also..nodetool compact is super
> intensive. I don't recommend this approach unless you're already very
> operationally sound.
> 2)Partition your data using a scheme that matches your domain model. It
> sounds like you're using a queue approach and by and large a distributed
> database that relies on tombstones is going to struggle with that by
> default. I have however, worked with a number of customers that use
> cassandra for a queue at scale and I detailed the modeling workarounds here
> http://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/
>
> On Tue, Jan 6, 2015 at 4:24 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
>
>> Hi Eduardo,
>>
>> Zitat von Eduardo Cusa <ed...@usmediaconsulting.com>:
>>
>>> [...]
>>> I have to worry about the tombstones generated? Considering that I will
>>> have many daily set updates
>>>
>>
>> that depends on your definition of "many"... we've run into a situation
>> where we wanted to age out old data using TTL... unfortunately, we ran into
>> the "tombstone_failure_threshold" limit rather quickly, having thousands of
>> record updates per second. That left us with a CF containing millions of
>> records that we couldn't "select" the way we originally intended.
>>
>> Regards,
>> Jens
>>
>>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>
Re: ttl in collections
Posted by Ryan Svihla <rs...@foundev.pro>.
Tombstone management is a big conversation, you can manage it in one of the
following ways
1) set a gc_grace_seconds of 0 and then run nodetool compact while using
size tiered compaction..as frequently as needed. This often is a pretty
lousy solution as gc_grace_seconds means you're not very partition tolerant
and it's easy to bring data back from the dead if you don't manage how you
bring nodes back online correctly. Also..nodetool compact is super
intensive. I don't recommend this approach unless you're already very
operationally sound.
2)Partition your data using a scheme that matches your domain model. It
sounds like you're using a queue approach and by and large a distributed
database that relies on tombstones is going to struggle with that by
default. I have however, worked with a number of customers that use
cassandra for a queue at scale and I detailed the modeling workarounds here
http://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/
On Tue, Jan 6, 2015 at 4:24 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
> Hi Eduardo,
>
> Zitat von Eduardo Cusa <ed...@usmediaconsulting.com>:
>
>> [...]
>> I have to worry about the tombstones generated? Considering that I will
>> have many daily set updates
>>
>
> that depends on your definition of "many"... we've run into a situation
> where we wanted to age out old data using TTL... unfortunately, we ran into
> the "tombstone_failure_threshold" limit rather quickly, having thousands of
> record updates per second. That left us with a CF containing millions of
> records that we couldn't "select" the way we originally intended.
>
> Regards,
> Jens
>
>
--
Thanks,
Ryan Svihla
Re: ttl in collections
Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Eduardo,
Zitat von Eduardo Cusa <ed...@usmediaconsulting.com>:
> [...]
> I have to worry about the tombstones generated? Considering that I will
> have many daily set updates
that depends on your definition of "many"... we've run into a
situation where we wanted to age out old data using TTL...
unfortunately, we ran into the "tombstone_failure_threshold" limit
rather quickly, having thousands of record updates per second. That
left us with a CF containing millions of records that we couldn't
"select" the way we originally intended.
Regards,
Jens
Re: ttl in collections
Posted by Eduardo Cusa <ed...@usmediaconsulting.com>.
Hi, using the following updates i made expire the direfent values in
deferent times:
update categories_sync using ttl 60 set category = category + {'2'} where
userid = 'u1';
update categories_sync using ttl 120 set category = category + {'3'}
where userid = 'u1';
update categories_sync using ttl 180 set category = category + {'4'}
where userid = 'u1';
I have to worry about the tombstones generated? Considering that I will
have many daily set updates
Regards
On Mon, Jan 5, 2015 at 3:23 PM, Eduardo Cusa <
eduardo.cusa@usmediaconsulting.com> wrote:
> Hi guys, I have to work with the following model:
>
> userid : text
> categories: [3, 4, 55, 623, ...........]
>
> in my use case, the list of values is updated every day, with 100 millons
> of users and a total of 500 categories at most.
>
>
> There is a way to assign a TT to each item in the category list?
>
>
> Regards
> Eduardo
>
>
>
>
>
>
>
>