You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jan Algermissen <al...@icloud.com> on 2017/05/25 14:19:34 UTC

Effect of frequent mutations / memtable

Hi,

I am using a updates to a column with a ttl to represent a lock. The 
owning process keeps updating the lock's TTL as long as it is running. 
If the process crashes, the lock will timeout and be deleted. Then 
another process can take over.

I have used this pattern very successfully over years with TTLs in the 
order of tens of seconds.

Now I have a use case in mind that would require much smaller TTLs, e.g. 
1 or two seconds and I am worried about the increased number of 
mutations and possible effect on SSTables.

However: I'd assume these frequent updates on a cell to mostly happen in 
the memtable resulting in only occasional manifestation in SSTables.

Is that assumption correct and if so, what config parameters should I 
tweak to keep the memtable from being flushed for longer periods of 
time?


Jan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Effect of frequent mutations / memtable

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

If you're doing high volumes of writes that simply overwrite values, you're
going to see memtables flush to disk when the commit log hits it's space
limit and you recycle commit log segments.

I agree, it makes sense to not write these values to disk only to compact
them, if this is your pattern.

On Fri, May 26, 2017 at 2:15 PM Jan Algermissen <al...@icloud.com>
wrote:

> Jonathan,
>
> On 26 May 2017, at 17:00, Jonathan Haddad wrote:
>
> > If you have a small amount of hot data, enable the row cache. The
> > memtable
> > is not designed to be a cache. You will not see a massive performance
> > impact of writing one to disk. Sstables will be in your page cache,
> > meaning
> > you won't be hitting disk very often.
>
> What I (and AFAIU Max, too) am concerned with is very frequent updates
> on certain cells and their impact on the amount of SSTables created.
>
> Suppose I have a row that sees tens of thousands of mutations during the
> first minutes of its lifetime but isn't changed afterwards. The
> hope/assumption is that tuning C* can help having all those mutations
> take place in the memtable so we end up with only a single SSTable in
> the end (roughly speaking).
>
> Besides such an exceptional case I'd consider high-frequent mutations an
> anti pattern due to the SSTables bloat.
>
> Makes sense?
>
> Jan
>
>
>
>
> > On Fri, May 26, 2017 at 7:41 AM Max C <mc...@core43.com> wrote:
> >
> >> In my case, we're using Cassandra to store QA test data — so the
> >> pattern
> >> is that we may do a bunch of updates within a few minutes / hours,
> >> and then
> >> the data will essentially be read-only for the rest of its lifetime
> >> (years).  My question is the same — do we need to worry about the
> >> performance impact of having N mutations written to the SSTable —
> >> or will
> >> these mutations generally be constrained to the mem table?
> >>
> >> - Max
> >>
> >>> Hi,
> >>>
> >>> I am using a updates to a column with a ttl to represent a lock. The
> >> owning process keeps updating the lock's TTL as long as it is
> >> running. If
> >> the process crashes, the lock will timeout and be deleted. Then
> >> another
> >> process can take over.
> >>>
> >>> I have used this pattern very successfully over years with TTLs in
> >>> the
> >> order of tens of seconds.
> >>>
> >>> Now I have a use case in mind that would require much smaller TTLs,
> >>> e.g.
> >> 1 or two seconds and I am worried about the increased number of
> >> mutations
> >> and possible effect on SSTables.
> >>>
> >>> However: I'd assume these frequent updates on a cell to mostly
> >>> happen in
> >> the memtable resulting in only occasional manifestation in SSTables.
> >>>
> >>> Is that assumption correct and if so, what config parameters should
> >>> I
> >> tweak to keep the memtable from being flushed for longer periods of
> >> time?
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: user-help@cassandra.apache.org
> >>
> >>
>
>
>

Re: Effect of frequent mutations / memtable

Posted by Jan Algermissen <al...@icloud.com>.

Jonathan,

On 26 May 2017, at 17:00, Jonathan Haddad wrote:

> If you have a small amount of hot data, enable the row cache. The 
> memtable
> is not designed to be a cache. You will not see a massive performance
> impact of writing one to disk. Sstables will be in your page cache, 
> meaning
> you won't be hitting disk very often.

What I (and AFAIU Max, too) am concerned with is very frequent updates 
on certain cells and their impact on the amount of SSTables created.

Suppose I have a row that sees tens of thousands of mutations during the 
first minutes of its lifetime but isn't changed afterwards. The 
hope/assumption is that tuning C* can help having all those mutations 
take place in the memtable so we end up with only a single SSTable in 
the end (roughly speaking).

Besides such an exceptional case I'd consider high-frequent mutations an 
anti pattern due to the SSTables bloat.

Makes sense?

Jan




> On Fri, May 26, 2017 at 7:41 AM Max C <mc...@core43.com> wrote:
>
>> In my case, we're using Cassandra to store QA test data — so the 
>> pattern
>> is that we may do a bunch of updates within a few minutes / hours, 
>> and then
>> the data will essentially be read-only for the rest of its lifetime
>> (years).  My question is the same — do we need to worry about the
>> performance impact of having N mutations written to the SSTable — 
>> or will
>> these mutations generally be constrained to the mem table?
>>
>> - Max
>>
>>> Hi,
>>>
>>> I am using a updates to a column with a ttl to represent a lock. The
>> owning process keeps updating the lock's TTL as long as it is 
>> running. If
>> the process crashes, the lock will timeout and be deleted. Then 
>> another
>> process can take over.
>>>
>>> I have used this pattern very successfully over years with TTLs in 
>>> the
>> order of tens of seconds.
>>>
>>> Now I have a use case in mind that would require much smaller TTLs, 
>>> e.g.
>> 1 or two seconds and I am worried about the increased number of 
>> mutations
>> and possible effect on SSTables.
>>>
>>> However: I'd assume these frequent updates on a cell to mostly 
>>> happen in
>> the memtable resulting in only occasional manifestation in SSTables.
>>>
>>> Is that assumption correct and if so, what config parameters should 
>>> I
>> tweak to keep the memtable from being flushed for longer periods of 
>> time?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Effect of frequent mutations / memtable

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

If you have a small amount of hot data, enable the row cache. The memtable
is not designed to be a cache. You will not see a massive performance
impact of writing one to disk. Sstables will be in your page cache, meaning
you won't be hitting disk very often.
On Fri, May 26, 2017 at 7:41 AM Max C <mc...@core43.com> wrote:

> In my case, we're using Cassandra to store QA test data — so the pattern
> is that we may do a bunch of updates within a few minutes / hours, and then
> the data will essentially be read-only for the rest of its lifetime
> (years).  My question is the same — do we need to worry about the
> performance impact of having N mutations written to the SSTable — or will
> these mutations generally be constrained to the mem table?
>
> - Max
>
> > Hi,
> >
> > I am using a updates to a column with a ttl to represent a lock. The
> owning process keeps updating the lock's TTL as long as it is running. If
> the process crashes, the lock will timeout and be deleted. Then another
> process can take over.
> >
> > I have used this pattern very successfully over years with TTLs in the
> order of tens of seconds.
> >
> > Now I have a use case in mind that would require much smaller TTLs, e.g.
> 1 or two seconds and I am worried about the increased number of mutations
> and possible effect on SSTables.
> >
> > However: I'd assume these frequent updates on a cell to mostly happen in
> the memtable resulting in only occasional manifestation in SSTables.
> >
> > Is that assumption correct and if so, what config parameters should I
> tweak to keep the memtable from being flushed for longer periods of time?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Effect of frequent mutations / memtable

Posted by Max C <mc...@core43.com>.

In my case, we're using Cassandra to store QA test data — so the pattern is that we may do a bunch of updates within a few minutes / hours, and then the data will essentially be read-only for the rest of its lifetime (years).  My question is the same — do we need to worry about the performance impact of having N mutations written to the SSTable — or will these mutations generally be constrained to the mem table?

- Max

> Hi,
> 
> I am using a updates to a column with a ttl to represent a lock. The owning process keeps updating the lock's TTL as long as it is running. If the process crashes, the lock will timeout and be deleted. Then another process can take over.
> 
> I have used this pattern very successfully over years with TTLs in the order of tens of seconds.
> 
> Now I have a use case in mind that would require much smaller TTLs, e.g. 1 or two seconds and I am worried about the increased number of mutations and possible effect on SSTables.
> 
> However: I'd assume these frequent updates on a cell to mostly happen in the memtable resulting in only occasional manifestation in SSTables.
> 
> Is that assumption correct and if so, what config parameters should I tweak to keep the memtable from being flushed for longer periods of time?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Effect of frequent mutations / memtable

Posted by "Thakrar, Jayesh" <jt...@conversantmedia.com>.

That's because Zookeeper is purpose built for such a kind of usage.
Its asynchronous nature  - e.g. you can create "watchers" with callbacks so that when ephemeral nodes die/disappear (due to servers crashing) makes it better to program.
It also reduces the "checkin" and "polling" cycle overhead.
Furthermore, zk does not have the "overhead" of other things that Cassandra does.

Honestly I am not familiar with Paxos and stuff, so can't speak to it.



On 5/25/17, 3:40 PM, "Jan Algermissen" <al...@icloud.com> wrote:

    Hi Jayesh,
    
    
    On 25 May 2017, at 18:31, Thakrar, Jayesh wrote:
    
    > Hi Jan,
    >
    > I would suggest looking at using Zookeeper for such a usecase.
    
    thanks - yes, it is an alternative.
    
    Out of curiosity: since both, Zk and C* implement Paxos to enable such 
    kind of thing, why do you think Zookeeper would be a better fit?
    
    Jan
    
    >
    > See http://zookeeper.apache.org/doc/trunk/recipes.html for some 
    > examples.
    >
    > Zookeeper is used for such purposes in Apache HBase (active master), 
    > Apache Kafka (active controller), Apache Hadoop, etc.
    >
    > Look for the "Leader Election" usecase.
    > Examples
    > http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/
    > https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm
    >
    > Its more/new work, but should be an elegant solution.
    >
    > Hope that helps.
    > Jayesh
    >
    > On 5/25/17, 9:19 AM, "Jan Algermissen" <al...@icloud.com> 
    > wrote:
    >
    >     Hi,
    >
    >     I am using a updates to a column with a ttl to represent a lock. 
    > The
    >     owning process keeps updating the lock's TTL as long as it is 
    > running.
    >     If the process crashes, the lock will timeout and be deleted. Then
    >     another process can take over.
    >
    >     I have used this pattern very successfully over years with TTLs in 
    > the
    >     order of tens of seconds.
    >
    >     Now I have a use case in mind that would require much smaller 
    > TTLs, e.g.
    >     1 or two seconds and I am worried about the increased number of
    >     mutations and possible effect on SSTables.
    >
    >     However: I'd assume these frequent updates on a cell to mostly 
    > happen in
    >     the memtable resulting in only occasional manifestation in 
    > SSTables.
    >
    >     Is that assumption correct and if so, what config parameters 
    > should I
    >     tweak to keep the memtable from being flushed for longer periods 
    > of
    >     time?
    >
    >
    >     Jan

Re: Effect of frequent mutations / memtable

Posted by Jan Algermissen <al...@icloud.com>.

Hi Jayesh,


On 25 May 2017, at 18:31, Thakrar, Jayesh wrote:

> Hi Jan,
>
> I would suggest looking at using Zookeeper for such a usecase.

thanks - yes, it is an alternative.

Out of curiosity: since both, Zk and C* implement Paxos to enable such 
kind of thing, why do you think Zookeeper would be a better fit?

Jan

>
> See http://zookeeper.apache.org/doc/trunk/recipes.html for some 
> examples.
>
> Zookeeper is used for such purposes in Apache HBase (active master), 
> Apache Kafka (active controller), Apache Hadoop, etc.
>
> Look for the "Leader Election" usecase.
> Examples
> http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/
> https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm
>
> Its more/new work, but should be an elegant solution.
>
> Hope that helps.
> Jayesh
>
> On 5/25/17, 9:19 AM, "Jan Algermissen" <al...@icloud.com> 
> wrote:
>
>     Hi,
>
>     I am using a updates to a column with a ttl to represent a lock. 
> The
>     owning process keeps updating the lock's TTL as long as it is 
> running.
>     If the process crashes, the lock will timeout and be deleted. Then
>     another process can take over.
>
>     I have used this pattern very successfully over years with TTLs in 
> the
>     order of tens of seconds.
>
>     Now I have a use case in mind that would require much smaller 
> TTLs, e.g.
>     1 or two seconds and I am worried about the increased number of
>     mutations and possible effect on SSTables.
>
>     However: I'd assume these frequent updates on a cell to mostly 
> happen in
>     the memtable resulting in only occasional manifestation in 
> SSTables.
>
>     Is that assumption correct and if so, what config parameters 
> should I
>     tweak to keep the memtable from being flushed for longer periods 
> of
>     time?
>
>
>     Jan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Effect of frequent mutations / memtable

Posted by "Thakrar, Jayesh" <jt...@conversantmedia.com>.

Hi Jan,

I would suggest looking at using Zookeeper for such a usecase.

See http://zookeeper.apache.org/doc/trunk/recipes.html for some examples.

Zookeeper is used for such purposes in Apache HBase (active master), Apache Kafka (active controller), Apache Hadoop, etc.

Look for the "Leader Election" usecase.
Examples
http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/
https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm

Its more/new work, but should be an elegant solution.

Hope that helps.
Jayesh

On 5/25/17, 9:19 AM, "Jan Algermissen" <al...@icloud.com> wrote:

Hi,

I am using a updates to a column with a ttl to represent a lock. The
owning process keeps updating the lock's TTL as long as it is running.
If the process crashes, the lock will timeout and be deleted. Then
another process can take over.

I have used this pattern very successfully over years with TTLs in the
order of tens of seconds.

Now I have a use case in mind that would require much smaller TTLs, e.g.
1 or two seconds and I am worried about the increased number of
mutations and possible effect on SSTables.

However: I'd assume these frequent updates on a cell to mostly happen in
the memtable resulting in only occasional manifestation in SSTables.

Is that assumption correct and if so, what config parameters should I
tweak to keep the memtable from being flushed for longer periods of
time?

Jan