You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by S C <as...@outlook.com> on 2013/09/01 09:06:20 UTC

RE: row cache

It is my understanding that row cache is on the memory (Not on disk). It could live on heap or native memory depending on the cache provider? Is that right? 

-SC


> Date: Fri, 23 Aug 2013 18:58:07 +0100
> From: bill@dehora.net
> To: user@cassandra.apache.org
> Subject: Re: row cache
> 
> I can't emphasise enough testing row caching against your workload for 
> sustained periods and comparing results to just leveraging the 
> filesystem cache and/or ssds. That said. The default off-heap cache can 
> work for structures that don't mutate frequently, and whose rows are not 
> very wide such that the in-and-out-of heap serialization overhead is 
> minimised (I've seen the off-heap cache slow a system down because of 
> serialization costs). The on-heap can do update in place, which is nice 
> for more frequently changing structures, and for larger structures 
> because it dodges the off-heap's serialization overhead. One problem 
> I've experienced with the on-heap cache is the cache working set 
> exceeding allocated space, resulting in GC pressure from sustained 
> thrash/evictions.
> 
> Neither cache seems suitable for wide row + slicing usecases, eg time 
> series data or CQL tables whose compound keys create wide rows under the 
> hood.
> 
> Bill
> 
> 
> On 2013/08/23 17:30, Robert Coli wrote:
> > On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala
> > <fsareshwala@quantcast.com <ma...@quantcast.com>> wrote:
> >
> >     According to the datastax documentation [1], there are two types of
> >     row cache providers:
> >
> > ...
> >
> >     The off-heap row cache provider does indeed invalidate rows. We're
> >     going to look into using the ConcurrentLinkedHashCacheProvider. Time
> >     to read some source code! :)
> >
> >
> > Thanks for the follow up... I'm used to thinking of the
> > ConcurrentLinkedHashCacheProvider as "the row cache" and forgot that
> > SerializingCacheProvider might have different invalidation behavior.
> > Invalidating the whole row on write seems highly likely to reduce the
> > overall performance of such a row cache. :)
> >
> > The criteria for use of row cache mentioned up-thread remain relevant.
> > In most cases, you probably don't actually want to use the row cache.
> > Especially if you're using ConcurrentLinkedHashCacheProvider and
> > creating long lived, on heap objects.
> >
> > =Rob
>

Re: row cache

Posted by Mohit Anchlia <mo...@gmail.com>.

I agree. We've had similar experience.

Sent from my iPhone

On Sep 7, 2013, at 6:05 PM, Edward Capriolo <ed...@gmail.com> wrote:

> I have found row cache to be more trouble then bene.
> 
> The term fools gold comes to mind.
> 
> Using key cache and leaving more free main memory seems stable and does not have as many complications. 
> On Wednesday, September 4, 2013, S C <as...@outlook.com> wrote:
> > Thank you all for your valuable comments and information.
> >
> > -SC
> >
> >
> >> Date: Tue, 3 Sep 2013 12:01:59 -0400
> >> From: chris.burroughs@gmail.com
> >> To: user@cassandra.apache.org
> >> CC: fsareshwala@quantcast.com
> >> Subject: Re: row cache
> >>
> >> On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
> >> > Yes, that is correct.
> >> >
> >> > The SerializingCacheProvider stores row cache contents off heap. I believe you
> >> > need JNA enabled for this though. Someone please correct me if I am wrong here.
> >> >
> >> > The ConcurrentLinkedHashCacheProvider stores row cache contents on the java heap
> >> > itself.
> >> >
> >>
> >> Naming things is hard. Both caches are in memory and are backed by a
> >> ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
> >> the *values* are stored in off heap buffers. Both must store a half
> >> dozen or so objects (on heap) per entry
> >> (org.apache.cassandra.cache.RowCacheKey,
> >> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
> >> java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
> >> probably be better to call this a "mixed-heap" rather than off-heap
> >> cache. You may find the number of entires you can hold without gc
> >> problems to be surprising low (relative to say memcached, or physical
> >> memory on modern hardware).
> >>
> >> Invalidating a column with SerializingCacheProvider invalidates the
> >> entire row while with ConcurrentLinkedHashCacheProvider it does not.
> >> SerializingCacheProvider does not require JNA.
> >>
> >> Both also use memory estimation of the size (of the values only) to
> >> determine the total number of entries retained. Estimating the size of
> >> the totally on-heap ConcurrentLinkedHashCacheProvider has historically
> >> been dicey since we switched from sizing in entries, and it has been
> >> removed in 2.0.0.
> >>
> >> As said elsewhere in this thread the utility of the row cache varies
> >> from "absolutely essential" to "source of numerous problems" depending
> >> on the specifics of the data model and request distribution.
> >>
> >>
> >

Re: row cache

Posted by Edward Capriolo <ed...@gmail.com>.

I have found row cache to be more trouble then bene.

The term fools gold comes to mind.

Using key cache and leaving more free main memory seems stable and does not
have as many complications.
On Wednesday, September 4, 2013, S C <as...@outlook.com> wrote:
> Thank you all for your valuable comments and information.
>
> -SC
>
>
>> Date: Tue, 3 Sep 2013 12:01:59 -0400
>> From: chris.burroughs@gmail.com
>> To: user@cassandra.apache.org
>> CC: fsareshwala@quantcast.com
>> Subject: Re: row cache
>>
>> On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
>> > Yes, that is correct.
>> >
>> > The SerializingCacheProvider stores row cache contents off heap. I
believe you
>> > need JNA enabled for this though. Someone please correct me if I am
wrong here.
>> >
>> > The ConcurrentLinkedHashCacheProvider stores row cache contents on the
java heap
>> > itself.
>> >
>>
>> Naming things is hard. Both caches are in memory and are backed by a
>> ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
>> the *values* are stored in off heap buffers. Both must store a half
>> dozen or so objects (on heap) per entry
>> (org.apache.cassandra.cache.RowCacheKey,
>>
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
>> java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
>> probably be better to call this a "mixed-heap" rather than off-heap
>> cache. You may find the number of entires you can hold without gc
>> problems to be surprising low (relative to say memcached, or physical
>> memory on modern hardware).
>>
>> Invalidating a column with SerializingCacheProvider invalidates the
>> entire row while with ConcurrentLinkedHashCacheProvider it does not.
>> SerializingCacheProvider does not require JNA.
>>
>> Both also use memory estimation of the size (of the values only) to
>> determine the total number of entries retained. Estimating the size of
>> the totally on-heap ConcurrentLinkedHashCacheProvider has historically
>> been dicey since we switched from sizing in entries, and it has been
>> removed in 2.0.0.
>>
>> As said elsewhere in this thread the utility of the row cache varies
>> from "absolutely essential" to "source of numerous problems" depending
>> on the specifics of the data model and request distribution.
>>
>>
>

RE: row cache

Posted by S C <as...@outlook.com>.

Thank you all for your valuable comments and information.

-SC


> Date: Tue, 3 Sep 2013 12:01:59 -0400
> From: chris.burroughs@gmail.com
> To: user@cassandra.apache.org
> CC: fsareshwala@quantcast.com
> Subject: Re: row cache
> 
> On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
> > Yes, that is correct.
> >
> > The SerializingCacheProvider stores row cache contents off heap. I believe you
> > need JNA enabled for this though. Someone please correct me if I am wrong here.
> >
> > The ConcurrentLinkedHashCacheProvider stores row cache contents on the java heap
> > itself.
> >
> 
> Naming things is hard.  Both caches are in memory and are backed by a 
> ConcurrentLinkekHashMap.  In the case of the SerializingCacheProvider 
> the *values* are stored in off heap buffers.  Both must store a half 
> dozen or so objects (on heap) per entry 
> (org.apache.cassandra.cache.RowCacheKey, 
> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue, 
> java.util.concurrent.ConcurrentHashMap$HashEntry, etc).  It would 
> probably be better to call this a "mixed-heap" rather than off-heap 
> cache.  You may find the number of entires you can hold without gc 
> problems to be surprising low (relative to say memcached, or physical 
> memory on modern hardware).
> 
> Invalidating a column with SerializingCacheProvider invalidates the 
> entire row while with ConcurrentLinkedHashCacheProvider it does not. 
> SerializingCacheProvider does not require JNA.
> 
> Both also use memory estimation of the size (of the values only) to 
> determine the total number of entries retained.  Estimating the size of 
> the totally on-heap ConcurrentLinkedHashCacheProvider has historically 
> been dicey since we switched from sizing in entries, and it has been 
> removed in 2.0.0.
> 
> As said elsewhere in this thread the utility of the row cache varies 
> from "absolutely essential" to "source of numerous problems" depending 
> on the specifics of the data model and request distribution.
> 
>

Re: row cache

Posted by Chris Burroughs <ch...@gmail.com>.

On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
> Yes, that is correct.
>
> The SerializingCacheProvider stores row cache contents off heap. I believe you
> need JNA enabled for this though. Someone please correct me if I am wrong here.
>
> The ConcurrentLinkedHashCacheProvider stores row cache contents on the java heap
> itself.
>

Naming things is hard.  Both caches are in memory and are backed by a 
ConcurrentLinkekHashMap.  In the case of the SerializingCacheProvider 
the *values* are stored in off heap buffers.  Both must store a half 
dozen or so objects (on heap) per entry 
(org.apache.cassandra.cache.RowCacheKey, 
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue, 
java.util.concurrent.ConcurrentHashMap$HashEntry, etc).  It would 
probably be better to call this a "mixed-heap" rather than off-heap 
cache.  You may find the number of entires you can hold without gc 
problems to be surprising low (relative to say memcached, or physical 
memory on modern hardware).

Invalidating a column with SerializingCacheProvider invalidates the 
entire row while with ConcurrentLinkedHashCacheProvider it does not. 
SerializingCacheProvider does not require JNA.

Both also use memory estimation of the size (of the values only) to 
determine the total number of entries retained.  Estimating the size of 
the totally on-heap ConcurrentLinkedHashCacheProvider has historically 
been dicey since we switched from sizing in entries, and it has been 
removed in 2.0.0.

As said elsewhere in this thread the utility of the row cache varies 
from "absolutely essential" to "source of numerous problems" depending 
on the specifics of the data model and request distribution.

Re: row cache

Posted by Faraaz Sareshwala <fs...@quantcast.com>.

Yes, that is correct.

The SerializingCacheProvider stores row cache contents off heap. I believe you
need JNA enabled for this though. Someone please correct me if I am wrong here.

The ConcurrentLinkedHashCacheProvider stores row cache contents on the java heap
itself.

Each cache provider has different characteristics so it's important to read up
on how each works and even try it with your workload to see which one gives you
better performance, if any at all.

Faraaz

On Sun, Sep 01, 2013 at 12:06:20AM -0700, S C wrote:
> It is my understanding that row cache is on the memory (Not on disk). It could
> live on heap or native memory depending on the cache provider? Is that right? 
> 
> -SC
> 
> 
> > Date: Fri, 23 Aug 2013 18:58:07 +0100
> > From: bill@dehora.net
> > To: user@cassandra.apache.org
> > Subject: Re: row cache
> >
> > I can't emphasise enough testing row caching against your workload for
> > sustained periods and comparing results to just leveraging the
> > filesystem cache and/or ssds. That said. The default off-heap cache can
> > work for structures that don't mutate frequently, and whose rows are not
> > very wide such that the in-and-out-of heap serialization overhead is
> > minimised (I've seen the off-heap cache slow a system down because of
> > serialization costs). The on-heap can do update in place, which is nice
> > for more frequently changing structures, and for larger structures
> > because it dodges the off-heap's serialization overhead. One problem
> > I've experienced with the on-heap cache is the cache working set
> > exceeding allocated space, resulting in GC pressure from sustained
> > thrash/evictions.
> >
> > Neither cache seems suitable for wide row + slicing usecases, eg time
> > series data or CQL tables whose compound keys create wide rows under the
> > hood.
> >
> > Bill
> >
> >
> > On 2013/08/23 17:30, Robert Coli wrote:
> > > On Thu, Aug 22, 2013 at 7:53 PM, Faraaz Sareshwala
> > > <fsareshwala@quantcast.com <ma...@quantcast.com>> wrote:
> > >
> > > According to the datastax documentation [1], there are two types of
> > > row cache providers:
> > >
> > > ...
> > >
> > > The off-heap row cache provider does indeed invalidate rows. We're
> > > going to look into using the ConcurrentLinkedHashCacheProvider. Time
> > > to read some source code! :)
> > >
> > >
> > > Thanks for the follow up... I'm used to thinking of the
> > > ConcurrentLinkedHashCacheProvider as "the row cache" and forgot that
> > > SerializingCacheProvider might have different invalidation behavior.
> > > Invalidating the whole row on write seems highly likely to reduce the
> > > overall performance of such a row cache. :)
> > >
> > > The criteria for use of row cache mentioned up-thread remain relevant.
> > > In most cases, you probably don't actually want to use the row cache.
> > > Especially if you're using ConcurrentLinkedHashCacheProvider and
> > > creating long lived, on heap objects.
> > >
> > > =Rob
> >