You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Artur Siekielski <ar...@vhex.net> on 2015/05/15 11:16:04 UTC

Performance penalty of multiple UPDATEs of non-pk columns

I've seen some discussions about the topic on the list recently, but I 
would like to get more clear answers.

Given the table:

CREATE TABLE t1 (
	f1 text,
	f2 text,
	f3 text,
	PRIMARY KEY(f1, f2)
);

and assuming I will execute UPDATE of f3 multiple times (say, 1000) for 
the same key values k1, k2 and different values of 'newval':

UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2;

How will the performance of selecting the current 'f3' value be affected?:

SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2;

It looks like all the previous values are preserved until compaction, 
but does executing the SELECT reads all the values (O(n), n - number of 
updates) or only the current one (O(1)) ?


How the situation looks for Counter types?

Re: Performance penalty of multiple UPDATEs of non-pk columns

Posted by Artur Siekielski <ar...@vhex.net>.
Thanks, I wasn't sure if memtables and sstables contain only the newest 
values (I though replication might require storing old values).

So the number of lookups for a newest value should be bound by 
max_compaction_threshold setting. Looks to me it's safe to perform many 
UPDATEs of non-pk columns.

On 05/21/2015 11:48 AM, Jens Rantil wrote:
> Artur,
>
> That's not entirely true. Writes to Cassandra are first written to a
> memtable (in-memory table) which is periodically flushed to disk. If
> multiple writes are coming in before the flush, then only a single
> record will be written to the disk/sstable. If your have writes that
> aren't coming within the same flush, they will get removed when you are
> compacting just like you say.
>
> Unfortunately I can't answer this regarding Counters as I haven't worked
> with them.
>
> Hope this helped at least.
>
> Cheers,
> Jens
>


Re: Performance penalty of multiple UPDATEs of non-pk columns

Posted by Sebastian Estevez <se...@datastax.com>.
Counters differ significantly between 2.0 and 2.1 (
https://issues.apache.org/jira/browse/CASSANDRA-6405 among others). But in
both scenarios, you will pay more for counter reconciles and compactions
vs. regular updates.

The final counter performance fix will come with CASSANDRA-6506.

For details read Aleksey's post -
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, May 21, 2015 at 5:48 AM, Jens Rantil <je...@tink.se> wrote:

> Artur,
>
> That's not entirely true. Writes to Cassandra are first written to a
> memtable (in-memory table) which is periodically flushed to disk. If
> multiple writes are coming in before the flush, then only a single record
> will be written to the disk/sstable. If your have writes that aren't coming
> within the same flush, they will get removed when you are compacting just
> like you say.
>
> Unfortunately I can't answer this regarding Counters as I haven't worked
> with them.
>
> Hope this helped at least.
>
> Cheers,
> Jens
>
> On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski <ar...@vhex.net> wrote:
>
>> I've seen some discussions about the topic on the list recently, but I
>> would like to get more clear answers.
>>
>> Given the table:
>>
>> CREATE TABLE t1 (
>>         f1 text,
>>         f2 text,
>>         f3 text,
>>         PRIMARY KEY(f1, f2)
>> );
>>
>> and assuming I will execute UPDATE of f3 multiple times (say, 1000) for
>> the same key values k1, k2 and different values of 'newval':
>>
>> UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2;
>>
>> How will the performance of selecting the current 'f3' value be affected?:
>>
>> SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2;
>>
>> It looks like all the previous values are preserved until compaction, but
>> does executing the SELECT reads all the values (O(n), n - number of
>> updates) or only the current one (O(1)) ?
>>
>>
>> How the situation looks for Counter types?
>>
>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.rantil@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>  Twitter <https://twitter.com/tink>
>

Re: Performance penalty of multiple UPDATEs of non-pk columns

Posted by Jens Rantil <je...@tink.se>.
Artur,

That's not entirely true. Writes to Cassandra are first written to a
memtable (in-memory table) which is periodically flushed to disk. If
multiple writes are coming in before the flush, then only a single record
will be written to the disk/sstable. If your have writes that aren't coming
within the same flush, they will get removed when you are compacting just
like you say.

Unfortunately I can't answer this regarding Counters as I haven't worked
with them.

Hope this helped at least.

Cheers,
Jens

On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski <ar...@vhex.net> wrote:

> I've seen some discussions about the topic on the list recently, but I
> would like to get more clear answers.
>
> Given the table:
>
> CREATE TABLE t1 (
>         f1 text,
>         f2 text,
>         f3 text,
>         PRIMARY KEY(f1, f2)
> );
>
> and assuming I will execute UPDATE of f3 multiple times (say, 1000) for
> the same key values k1, k2 and different values of 'newval':
>
> UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2;
>
> How will the performance of selecting the current 'f3' value be affected?:
>
> SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2;
>
> It looks like all the previous values are preserved until compaction, but
> does executing the SELECT reads all the values (O(n), n - number of
> updates) or only the current one (O(1)) ?
>
>
> How the situation looks for Counter types?
>



-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>