You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Aklin_81 <as...@gmail.com> on 2011/02/18 08:14:34 UTC

Frequent updates of freshly written columns

Are the very freshly written columns to a row in memtables, efficiently
updated/overwritten by edited/new column values.

After flushing of memtable, are those(edited + unedited ones) columns stored
together on disk (in same blocks!?) as if they were written in one single
operation or same time ?? I know if old columns are edited then several
copies of same column will be dispersed in different sst tables, what about
fresh columns ?

Are there any disadvantages to frequently updating fresh columns present in
memtable ?

Re: Frequent updates of freshly written columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Feb 18, 2011 at 6:19 PM, Aklin_81 <as...@gmail.com> wrote:

> Sylvain,
> I also need to store data that is frequently updated, same column
> being updated several times during each user session, at each action
> by user, But, this data is not very fresh and hence when I update this
> column frequently, there would be many versions of the same column in
> several sst files!
> Reading this type of data would not be too efficient I guess as the
> row would be totally scattered!
>
> Could there be any better strategy to store such data in cassandra?


> (Since the column holds an aggregate data obtained from all actions of
> the users, I have the need of updating that same column again & again)
>

That why compaction is for. Hopefully even if the column is scattered in
many sstable, compaction will keep that to a handfull of them. Chances are,
you won't see too bad read performances. But other than that, tweaking
memtable thresholds so that you don't flush too often will also help.

Now I don't what is your use case exactly and what is this aggregate. But if
there is a natural way to split this aggregate in multiple columns so that
each update will update only one of those columns forming the aggregate,
hopefully that would help. Really depends on what we are talking about.


> my another doubt,  When old column has been updated and exists in the
> memtable, but other versions of the column in SST tables exist, do the
> reads also scan the sst tables for that column, after memtable. or is
> that smart enough to say that this column is the most recent one ?
>

It can't skip the sstable. The problem is that you never know if the value
you see in the sstable is the more recent one. To take a concrete example,
suppose a node was down. When he goes up, changes are that he will see new
updates before he sees old updates that went while he was down (those will
arrive with either Hinted Handoff, read repair or repair). And more
generally, there is never any guarantee that messages will arrive to
replicas in the order they were received by the coordinator(s).

--
Sylvain


>
> On Fri, Feb 18, 2011 at 10:32 PM, Aklin_81 <as...@gmail.com> wrote:
> >
> > Sylvain,
> > I also need to store data that is frequently updated, same column being
> updated several times during each user session, at each action by user, But,
> this data is not very fresh and hence when I update this column frequently,
> there would be many versions of the same column in several sst files!
> > Reading this type of data would not be too efficient I guess as the row
> would be totally scattered!
> >
> > Could there be any better strategy to store such data in cassandra?
> >
> > (Since the column holds an aggregate data obtained from all actions of
> the users, I have the need of updating that same column again & again)
> >
> >
> > my another doubt,  When old column has been updated and exists in the
> memtable, but other versions of the column in SST tables exist, do the reads
> also scan the sst tables for that column, after memtable. or is that smart
> enough to say that this column is the most recent one ?
> >
> >
> >
> >
> > On Fri, Feb 18, 2011 at 8:54 PM, James Churchman <
> jameschurchman@gmail.com> wrote:
> >>
> >> ok great, thanks for the exact clarification
> >> On 18 Feb 2011, at 14:11, Aklin_81 wrote:
> >>
> >> Compaction does not 'mutate' the sst files, it 'merges' several sst
> files into one with new indexes, merged data rows & deleting tombstones.
> Thus you reclaim your disk space.
> >>
> >>
> >> On Fri, Feb 18, 2011 at 7:34 PM, James Churchman <
> jameschurchman@gmail.com> wrote:
> >>>
> >>> but a compaction will mutate the sstables and reclaim the
> space (eventually)  ?
> >>>
> >>> james
> >>> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
> >>>
> >>> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 <as...@gmail.com> wrote:
> >>>>
> >>>> Are the very freshly written columns to a row in memtables,
> efficiently updated/overwritten by edited/new column values.
> >>>>
> >>>> After flushing of memtable, are those(edited + unedited ones) columns
> stored together on disk (in same blocks!?) as if they were written in one
> single operation or same time ?? I know if old columns are edited then
> several copies of same column will be dispersed in different sst tables,
> what about fresh columns ?
> >>>>
> >>>> Are there any disadvantages to frequently updating fresh columns
> present in memtable ?
> >>>
> >>> The SSTables are immutable but the memtable are not. As long as you
> update/overwrite a column that is still in memtable, it is simply replaced
> in memory (so it's as efficient as it gets).
> >>> In other words, when the memtable is flushed, only the last version of
> the column goes in.
> >>> --
> >>> Sylvain
> >>
> >>
> >
>

Re: Frequent updates of freshly written columns

Posted by Aklin_81 <as...@gmail.com>.
Sylvain,
I also need to store data that is frequently updated, same column
being updated several times during each user session, at each action
by user, But, this data is not very fresh and hence when I update this
column frequently, there would be many versions of the same column in
several sst files!
Reading this type of data would not be too efficient I guess as the
row would be totally scattered!

Could there be any better strategy to store such data in cassandra?

(Since the column holds an aggregate data obtained from all actions of
the users, I have the need of updating that same column again & again)


my another doubt,  When old column has been updated and exists in the
memtable, but other versions of the column in SST tables exist, do the
reads also scan the sst tables for that column, after memtable. or is
that smart enough to say that this column is the most recent one ?

On Fri, Feb 18, 2011 at 10:32 PM, Aklin_81 <as...@gmail.com> wrote:
>
> Sylvain,
> I also need to store data that is frequently updated, same column being updated several times during each user session, at each action by user, But, this data is not very fresh and hence when I update this column frequently, there would be many versions of the same column in several sst files!
> Reading this type of data would not be too efficient I guess as the row would be totally scattered!
>
> Could there be any better strategy to store such data in cassandra?
>
> (Since the column holds an aggregate data obtained from all actions of the users, I have the need of updating that same column again & again)
>
>
> my another doubt,  When old column has been updated and exists in the memtable, but other versions of the column in SST tables exist, do the reads also scan the sst tables for that column, after memtable. or is that smart enough to say that this column is the most recent one ?
>
>
>
>
> On Fri, Feb 18, 2011 at 8:54 PM, James Churchman <ja...@gmail.com> wrote:
>>
>> ok great, thanks for the exact clarification
>> On 18 Feb 2011, at 14:11, Aklin_81 wrote:
>>
>> Compaction does not 'mutate' the sst files, it 'merges' several sst files into one with new indexes, merged data rows & deleting tombstones. Thus you reclaim your disk space.
>>
>>
>> On Fri, Feb 18, 2011 at 7:34 PM, James Churchman <ja...@gmail.com> wrote:
>>>
>>> but a compaction will mutate the sstables and reclaim the space (eventually)  ?
>>>
>>> james
>>> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
>>>
>>> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 <as...@gmail.com> wrote:
>>>>
>>>> Are the very freshly written columns to a row in memtables, efficiently updated/overwritten by edited/new column values.
>>>>
>>>> After flushing of memtable, are those(edited + unedited ones) columns stored together on disk (in same blocks!?) as if they were written in one single operation or same time ?? I know if old columns are edited then several copies of same column will be dispersed in different sst tables, what about fresh columns ?
>>>>
>>>> Are there any disadvantages to frequently updating fresh columns present in memtable ?
>>>
>>> The SSTables are immutable but the memtable are not. As long as you update/overwrite a column that is still in memtable, it is simply replaced in memory (so it's as efficient as it gets).
>>> In other words, when the memtable is flushed, only the last version of the column goes in.
>>> --
>>> Sylvain
>>
>>
>

Re: Frequent updates of freshly written columns

Posted by James Churchman <ja...@gmail.com>.
ok great, thanks for the exact clarification

On 18 Feb 2011, at 14:11, Aklin_81 wrote:

> Compaction does not 'mutate' the sst files, it 'merges' several sst files into one with new indexes, merged data rows & deleting tombstones. Thus you reclaim your disk space.
> 
> 
> On Fri, Feb 18, 2011 at 7:34 PM, James Churchman <ja...@gmail.com> wrote:
> but a compaction will mutate the sstables and reclaim the space (eventually)  ? 
> 
> 
> james
> 
> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
> 
>> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 <as...@gmail.com> wrote:
>> Are the very freshly written columns to a row in memtables, efficiently updated/overwritten by edited/new column values. 
>> 
>> After flushing of memtable, are those(edited + unedited ones) columns stored together on disk (in same blocks!?) as if they were written in one single operation or same time ?? I know if old columns are edited then several copies of same column will be dispersed in different sst tables, what about fresh columns ?
>> 
>> Are there any disadvantages to frequently updating fresh columns present in memtable ? 
>> 
>> The SSTables are immutable but the memtable are not. As long as you update/overwrite a column that is still in memtable, it is simply replaced in memory (so it's as efficient as it gets).
>> In other words, when the memtable is flushed, only the last version of the column goes in. 
>> 
>> --
>> Sylvain
> 
> 


Re: Frequent updates of freshly written columns

Posted by Aklin_81 <as...@gmail.com>.
Compaction does not 'mutate' the sst files, it 'merges' several sst files
into one with new indexes, merged data rows & deleting tombstones. Thus you
reclaim your disk space.


On Fri, Feb 18, 2011 at 7:34 PM, James Churchman
<ja...@gmail.com>wrote:

> but a compaction will mutate the sstables and reclaim the
> space (eventually)  ?
>
>
> james
>
> On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:
>
> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 <as...@gmail.com> wrote:
>
>> Are the very freshly written columns to a row in memtables, efficiently
>> updated/overwritten by edited/new column values.
>>
>> After flushing of memtable, are those(edited + unedited ones) columns
>> stored together on disk (in same blocks!?) as if they were written in one
>> single operation or same time ?? I know if old columns are edited then
>> several copies of same column will be dispersed in different sst tables,
>> what about fresh columns ?
>>
>> Are there any disadvantages to frequently updating fresh columns present
>> in memtable ?
>>
>
> The SSTables are immutable but the memtable are not. As long as you
> update/overwrite a column that is still in memtable, it is simply replaced
> in memory (so it's as efficient as it gets).
> In other words, when the memtable is flushed, only the last version of the
> column goes in.
>
> --
> Sylvain
>
>
>

Re: Frequent updates of freshly written columns

Posted by James Churchman <ja...@gmail.com>.
but a compaction will mutate the sstables and reclaim the space (eventually)  ? 


james

On 18 Feb 2011, at 08:36, Sylvain Lebresne wrote:

> On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 <as...@gmail.com> wrote:
> Are the very freshly written columns to a row in memtables, efficiently updated/overwritten by edited/new column values. 
> 
> After flushing of memtable, are those(edited + unedited ones) columns stored together on disk (in same blocks!?) as if they were written in one single operation or same time ?? I know if old columns are edited then several copies of same column will be dispersed in different sst tables, what about fresh columns ?
> 
> Are there any disadvantages to frequently updating fresh columns present in memtable ? 
> 
> The SSTables are immutable but the memtable are not. As long as you update/overwrite a column that is still in memtable, it is simply replaced in memory (so it's as efficient as it gets).
> In other words, when the memtable is flushed, only the last version of the column goes in. 
> 
> --
> Sylvain


Re: Frequent updates of freshly written columns

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Feb 18, 2011 at 8:14 AM, Aklin_81 <as...@gmail.com> wrote:

> Are the very freshly written columns to a row in memtables, efficiently
> updated/overwritten by edited/new column values.
>
> After flushing of memtable, are those(edited + unedited ones) columns
> stored together on disk (in same blocks!?) as if they were written in one
> single operation or same time ?? I know if old columns are edited then
> several copies of same column will be dispersed in different sst tables,
> what about fresh columns ?
>
> Are there any disadvantages to frequently updating fresh columns present in
> memtable ?
>

The SSTables are immutable but the memtable are not. As long as you
update/overwrite a column that is still in memtable, it is simply replaced
in memory (so it's as efficient as it gets).
In other words, when the memtable is flushed, only the last version of the
column goes in.

--
Sylvain