You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jimmy Lin <y2...@gmail.com> on 2014/04/29 04:27:32 UTC

row caching for frequently updated column

I am wondering if there is any negative impact on Cassandra write
operation, if I turn on row caching for a table that has mostly 'static
columns' but few frequently write columns (like timestamp).

The application will frequently write to a few columns, and the application
will also frequently query entire row.

How Cassandra handle update column to a cached row?
does it update both memtables value and also the row cached row's
column(which dealing with memory update so it is very fast) ?
or in order to update the cached row, entire row need to read back from
sstable?


thanks

Re: row caching for frequently updated column

Posted by Nate McCall <na...@thelastpickle.com>.
>
>
> if Cassandra invalidate the  row cache upon a single column update to that
> row, that seems very inefficient.
>
>
>
Yes. For the most recent direction, take a look at:
https://issues.apache.org/jira/browse/CASSANDRA-5357




-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: row caching for frequently updated column

Posted by Chris Burroughs <ch...@gmail.com>.
You are close.

On 04/30/2014 12:41 AM, Jimmy Lin wrote:
> thanks all for the pointers.
>
> let' me see if I can put the sequences of event together ....
>
> 1.2
> people mis-understand/mis-use row cache, that cassandra cached the entire
> row of data even if you are only looking for small subset of the row data.
> e.g
> select single_column from a_wide_row_table
> will result in entire row cached even if you are only interested in one
> single column of a row.
>

Yep!

> 2.0
> and because of potential misuse of heap memory, Cassandra 2.0 remove heap
> cache, and only support off-heap cache, which has a side effect that write
> will invalidate the row cache(my original question)
>

"off-heap" is a common but misleading name for the 
SerializingCacheProvider.  It still stores several objects on heap per 
cached item and has to deser on read.

> 2.1
> the coming 2.1 Cassandra will offer true cache by query, so the cached data
> will be much more efficient even for wide rows(it cached what it needs).
>
> do I get it right?
> for the new 2.1 row caching, is it still true that a write or update to the
> row will still invalidate the cached row ?
>

I don't think "true cache by query" is an accurate description of 
CASSANDRA-5357.  I think it's more like a "head of the row" cache.


Re: row caching for frequently updated column

Posted by Jimmy Lin <y2...@gmail.com>.
thanks all for the pointers.

let' me see if I can put the sequences of event together ....

1.2
people mis-understand/mis-use row cache, that cassandra cached the entire
row of data even if you are only looking for small subset of the row data.
e.g
select single_column from a_wide_row_table
will result in entire row cached even if you are only interested in one
single column of a row.

2.0
and because of potential misuse of heap memory, Cassandra 2.0 remove heap
cache, and only support off-heap cache, which has a side effect that write
will invalidate the row cache(my original question)

2.1
the coming 2.1 Cassandra will offer true cache by query, so the cached data
will be much more efficient even for wide rows(it cached what it needs).

do I get it right?
for the new 2.1 row caching, is it still true that a write or update to the
row will still invalidate the cached row ?




On Tue, Apr 29, 2014 at 3:00 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Apr 29, 2014 at 1:53 PM, Brian Lam <y2...@gmail.com> wrote:
>
>> Are these issues 'resolved' only in 2.0 or later release?
>>
>> What about 1.2 version?
>>
>
> As I understand it :
>
> 1.2 version has the on-heap row cache and off-heap row cache. It does not
> have the new "partition" cache.
> 2.0 version has only the off-heap row cache. It does not have the on-heap
> row cache or the new "partition" cache.
> 2.1 version has the new "partition" cache.
>
> In summary, you probably don't want to use any of these half-baked,
> immature internal row/etc. "caches" unless you are very, very certain that
> you have an ideal case for them.
>
> =Rob
>

Re: row caching for frequently updated column

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Apr 29, 2014 at 1:53 PM, Brian Lam <y2...@gmail.com> wrote:

> Are these issues 'resolved' only in 2.0 or later release?
>
> What about 1.2 version?
>

As I understand it :

1.2 version has the on-heap row cache and off-heap row cache. It does not
have the new "partition" cache.
2.0 version has only the off-heap row cache. It does not have the on-heap
row cache or the new "partition" cache.
2.1 version has the new "partition" cache.

In summary, you probably don't want to use any of these half-baked,
immature internal row/etc. "caches" unless you are very, very certain that
you have an ideal case for them.

=Rob

Re: row caching for frequently updated column

Posted by Brian Lam <y2...@gmail.com>.
Are these issues 'resolved' only in 2.0 or later release?

What about 1.2 version?



On Apr 29, 2014, at 9:40 AM, Robert Coli <rc...@eventbrite.com> wrote:

On Tue, Apr 29, 2014 at 9:30 AM, Jimmy Lin <y2...@gmail.com> wrote:

> if Cassandra invalidate the  row cache upon a single column update to that
> row, that seems very inefficient.
>

https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

=Rob

Re: row caching for frequently updated column

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Apr 29, 2014 at 9:30 AM, Jimmy Lin <y2...@gmail.com> wrote:

> if Cassandra invalidate the  row cache upon a single column update to that
> row, that seems very inefficient.
>

https://issues.apache.org/jira/browse/CASSANDRA-5348?focusedCommentId=13794634&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13794634

=Rob

Re: row caching for frequently updated column

Posted by Jimmy Lin <y2...@gmail.com>.
hi,
>>> writing a new value to a row will invalidate the row cache for that
value
do you mean the entire row will be invalidate ? or just the column it was
being updated ?

I was reading through
http://planetcassandra.org/blog/post/cassandra-11-tuning-for-frequent-column-updates/
that seems to indicate it just write through it and not invalidate the
entire row.

if Cassandra invalidate the  row cache upon a single column update to that
row, that seems very inefficient.





On Tue, Apr 29, 2014 at 4:43 AM, Jonathan Lacefield <jlacefield@datastax.com
> wrote:

> Hello,
>
>
>   Iirc writing a new value to a row will invalidate the row cache for that
> value.  Row cache is only populated after a read operation.
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configuring_caches_c.html?scroll=concept_ds_n35_nnr_ck
>
>   Cassandra provides the ability to "preheat" key and page cache, but I
> don't believe this is possible for row cache.
>
>   Hope that helps.
>
> Jonathan
>
>
> Jonathan Lacefield
> Solutions Architect, DataStax
> (404) 822 3487
> <http://www.linkedin.com/in/jlacefield>
>
> <http://www.datastax.com/cassandrasummit14>
>
>
>
> On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin <y2...@gmail.com> wrote:
>
>> I am wondering if there is any negative impact on Cassandra write
>> operation, if I turn on row caching for a table that has mostly 'static
>> columns' but few frequently write columns (like timestamp).
>>
>> The application will frequently write to a few columns, and the
>> application will also frequently query entire row.
>>
>> How Cassandra handle update column to a cached row?
>> does it update both memtables value and also the row cached row's
>> column(which dealing with memory update so it is very fast) ?
>> or in order to update the cached row, entire row need to read back from
>> sstable?
>>
>>
>> thanks
>>
>>
>

Re: row caching for frequently updated column

Posted by Jonathan Lacefield <jl...@datastax.com>.
Hello,


  Iirc writing a new value to a row will invalidate the row cache for that
value.  Row cache is only populated after a read operation.
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_configuring_caches_c.html?scroll=concept_ds_n35_nnr_ck

  Cassandra provides the ability to "preheat" key and page cache, but I
don't believe this is possible for row cache.

  Hope that helps.

Jonathan


Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
<http://www.linkedin.com/in/jlacefield>

<http://www.datastax.com/cassandrasummit14>



On Mon, Apr 28, 2014 at 10:27 PM, Jimmy Lin <y2...@gmail.com> wrote:

> I am wondering if there is any negative impact on Cassandra write
> operation, if I turn on row caching for a table that has mostly 'static
> columns' but few frequently write columns (like timestamp).
>
> The application will frequently write to a few columns, and the
> application will also frequently query entire row.
>
> How Cassandra handle update column to a cached row?
> does it update both memtables value and also the row cached row's
> column(which dealing with memory update so it is very fast) ?
> or in order to update the cached row, entire row need to read back from
> sstable?
>
>
> thanks
>
>