You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Todd Burruss <bb...@expedia.com> on 2011/11/18 01:53:40 UTC

ParNew and caching

I'm using cassandra 1.0.  Been doing some testing on using cass's cache.  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to 200-300ms.  This really screws with response times, which jump from ~25-30ms to 1300+ms.  I've increase new gen and that helps, but still this is suprising to me, especially since 1.0 defaults to the SerializingCacheProvider – off heap.

The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50 bytes per column value.  The cache only must be about 400 rows to catch all the data per node and JMX is reporting 100% cache hits.  Nodetool ring reports < 2gb per node, my heap is 6gb and total RAM is 16gb.

Thoughts?

Re: ParNew and caching

Posted by Mohit Anchlia <mo...@gmail.com>.
On Fri, Nov 18, 2011 at 9:42 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> On Fri, Nov 18, 2011 at 6:31 PM, Mohit Anchlia <mo...@gmail.com> wrote:
>> On Fri, Nov 18, 2011 at 7:47 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>>> On Fri, Nov 18, 2011 at 4:23 PM, Mohit Anchlia <mo...@gmail.com> wrote:
>>>> On Fri, Nov 18, 2011 at 6:39 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>>>>> On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com> wrote:
>>>>>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>>>>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>>>>>> 200-300ms.  This really screws with response times, which jump from ~25-30ms
>>>>>> to 1300+ms.  I've increase new gen and that helps, but still this is
>>>>>> suprising to me, especially since 1.0 defaults to the
>>>>>> SerializingCacheProvider – off heap.
>>>>>> The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50
>>>>>> bytes per column value.  The cache only must be about 400 rows to catch all
>>>>>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>>>>>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>>>>>> Thoughts?
>>>>>
>>>>> You're problem is the mix of wide rows and the serializing cache.
>>>>> What happens with the serializing cache is that our data is stored
>>>>> out of the heap. But that means that for each read to a row, we
>>>>> 'deserialize' the row for the out-of-heap memory into the heap to
>>>>> return it. The thing is, when we do that, we do the full row each
>>>>> time. In other word, for each query we deserialize 70k+ columns
>>>>> even if to return only one. I'm willing to bet this is what is killing
>>>>> your response time. If you want to cache wide rows, I really
>>>>> suggest you're using the ConcurrentLinkedHashCacheProvider
>>>>> instead.
>>>>
>>>> What happens when using ConcurrentLinkedHashCache? What is the
>>>> implementation like and why is it better?
>>>
>>> With ConcurrentLinkedHashCache, the cache is in the heap. So there
>>> is no deserialization/copy during gets, so having wide rows is not a
>>> problem. Outside of the fact that if you're enabling cache on a column
>>> family with wide rows, you have to keep in mind that we always keep
>>> full rows in cache.
>>>
>>
>> Wouldn't it move the problem to GC pauses from not being able to clean
>> up old generation? I am using these rows in concurrenthashmap will get
>> migrated to old gen.
>
> Kinda, yes, that's why we have a serializing cache :)
>
> I mean, caching rows of 70k+ columns is *not* the typical case we've
> optimized for (https://issues.apache.org/jira/browse/CASSANDRA-1956
> should improve here) and so yes neither the serializing cache nor the linked
> hash one will be perfect in that case. But the serializing cache is just worst
> in that specific case.

Thanks! This makes sense.

>
> --
> Sylvain
>
>>>>
>>>>>
>>>>> I'll also note that this explain the ParNew times too. Deserializing
>>>>> all those columns from off-heap creates lots of short-lived object,
>>>>> and since you deserialize 70k+ on each query, that's quite some
>>>>> pressure on the new gen. Note that the serializing cache is
>>>>> actually minimizing the use of old gen, because that is the one
>>>>> that is the one that can create huge GC pauses with big heap,
>>>>> but it actually put more pressure on the new gen. This is by
>>>>> design and because new gen is much less of a problem than
>>>>> old gen.
>>>>
>>>> In this scenario would it help if Young generation space is increased?
>>>
>>> That's a hard one to answer because GC tuning is a bit of a black
>>> art, when testing and benchmarking is often key. Having a bigger
>>> young generation means having young collection kicked less often
>>> but on the other side it reduces the size for the old generation.
>>> But again, I don't think the problem is really the GC here, at least not
>>> primarily.
>>>
>>> --
>>> Sylvain
>>>
>>>>
>>>>>
>>>>> --
>>>>> Sylvain
>>>>>
>>>>
>>>
>>
>

Re: ParNew and caching

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Nov 18, 2011 at 6:31 PM, Mohit Anchlia <mo...@gmail.com> wrote:
> On Fri, Nov 18, 2011 at 7:47 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>> On Fri, Nov 18, 2011 at 4:23 PM, Mohit Anchlia <mo...@gmail.com> wrote:
>>> On Fri, Nov 18, 2011 at 6:39 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>>>> On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com> wrote:
>>>>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>>>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>>>>> 200-300ms.  This really screws with response times, which jump from ~25-30ms
>>>>> to 1300+ms.  I've increase new gen and that helps, but still this is
>>>>> suprising to me, especially since 1.0 defaults to the
>>>>> SerializingCacheProvider – off heap.
>>>>> The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50
>>>>> bytes per column value.  The cache only must be about 400 rows to catch all
>>>>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>>>>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>>>>> Thoughts?
>>>>
>>>> You're problem is the mix of wide rows and the serializing cache.
>>>> What happens with the serializing cache is that our data is stored
>>>> out of the heap. But that means that for each read to a row, we
>>>> 'deserialize' the row for the out-of-heap memory into the heap to
>>>> return it. The thing is, when we do that, we do the full row each
>>>> time. In other word, for each query we deserialize 70k+ columns
>>>> even if to return only one. I'm willing to bet this is what is killing
>>>> your response time. If you want to cache wide rows, I really
>>>> suggest you're using the ConcurrentLinkedHashCacheProvider
>>>> instead.
>>>
>>> What happens when using ConcurrentLinkedHashCache? What is the
>>> implementation like and why is it better?
>>
>> With ConcurrentLinkedHashCache, the cache is in the heap. So there
>> is no deserialization/copy during gets, so having wide rows is not a
>> problem. Outside of the fact that if you're enabling cache on a column
>> family with wide rows, you have to keep in mind that we always keep
>> full rows in cache.
>>
>
> Wouldn't it move the problem to GC pauses from not being able to clean
> up old generation? I am using these rows in concurrenthashmap will get
> migrated to old gen.

Kinda, yes, that's why we have a serializing cache :)

I mean, caching rows of 70k+ columns is *not* the typical case we've
optimized for (https://issues.apache.org/jira/browse/CASSANDRA-1956
should improve here) and so yes neither the serializing cache nor the linked
hash one will be perfect in that case. But the serializing cache is just worst
in that specific case.

--
Sylvain

>>>
>>>>
>>>> I'll also note that this explain the ParNew times too. Deserializing
>>>> all those columns from off-heap creates lots of short-lived object,
>>>> and since you deserialize 70k+ on each query, that's quite some
>>>> pressure on the new gen. Note that the serializing cache is
>>>> actually minimizing the use of old gen, because that is the one
>>>> that is the one that can create huge GC pauses with big heap,
>>>> but it actually put more pressure on the new gen. This is by
>>>> design and because new gen is much less of a problem than
>>>> old gen.
>>>
>>> In this scenario would it help if Young generation space is increased?
>>
>> That's a hard one to answer because GC tuning is a bit of a black
>> art, when testing and benchmarking is often key. Having a bigger
>> young generation means having young collection kicked less often
>> but on the other side it reduces the size for the old generation.
>> But again, I don't think the problem is really the GC here, at least not
>> primarily.
>>
>> --
>> Sylvain
>>
>>>
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>
>>
>

Re: ParNew and caching

Posted by Mohit Anchlia <mo...@gmail.com>.
On Fri, Nov 18, 2011 at 7:47 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> On Fri, Nov 18, 2011 at 4:23 PM, Mohit Anchlia <mo...@gmail.com> wrote:
>> On Fri, Nov 18, 2011 at 6:39 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>>> On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com> wrote:
>>>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>>>> 200-300ms.  This really screws with response times, which jump from ~25-30ms
>>>> to 1300+ms.  I've increase new gen and that helps, but still this is
>>>> suprising to me, especially since 1.0 defaults to the
>>>> SerializingCacheProvider – off heap.
>>>> The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50
>>>> bytes per column value.  The cache only must be about 400 rows to catch all
>>>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>>>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>>>> Thoughts?
>>>
>>> You're problem is the mix of wide rows and the serializing cache.
>>> What happens with the serializing cache is that our data is stored
>>> out of the heap. But that means that for each read to a row, we
>>> 'deserialize' the row for the out-of-heap memory into the heap to
>>> return it. The thing is, when we do that, we do the full row each
>>> time. In other word, for each query we deserialize 70k+ columns
>>> even if to return only one. I'm willing to bet this is what is killing
>>> your response time. If you want to cache wide rows, I really
>>> suggest you're using the ConcurrentLinkedHashCacheProvider
>>> instead.
>>
>> What happens when using ConcurrentLinkedHashCache? What is the
>> implementation like and why is it better?
>
> With ConcurrentLinkedHashCache, the cache is in the heap. So there
> is no deserialization/copy during gets, so having wide rows is not a
> problem. Outside of the fact that if you're enabling cache on a column
> family with wide rows, you have to keep in mind that we always keep
> full rows in cache.
>

Wouldn't it move the problem to GC pauses from not being able to clean
up old generation? I am using these rows in concurrenthashmap will get
migrated to old gen.
>>
>>>
>>> I'll also note that this explain the ParNew times too. Deserializing
>>> all those columns from off-heap creates lots of short-lived object,
>>> and since you deserialize 70k+ on each query, that's quite some
>>> pressure on the new gen. Note that the serializing cache is
>>> actually minimizing the use of old gen, because that is the one
>>> that is the one that can create huge GC pauses with big heap,
>>> but it actually put more pressure on the new gen. This is by
>>> design and because new gen is much less of a problem than
>>> old gen.
>>
>> In this scenario would it help if Young generation space is increased?
>
> That's a hard one to answer because GC tuning is a bit of a black
> art, when testing and benchmarking is often key. Having a bigger
> young generation means having young collection kicked less often
> but on the other side it reduces the size for the old generation.
> But again, I don't think the problem is really the GC here, at least not
> primarily.
>
> --
> Sylvain
>
>>
>>>
>>> --
>>> Sylvain
>>>
>>
>

Re: ParNew and caching

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Nov 18, 2011 at 4:23 PM, Mohit Anchlia <mo...@gmail.com> wrote:
> On Fri, Nov 18, 2011 at 6:39 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>> On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com> wrote:
>>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>>> 200-300ms.  This really screws with response times, which jump from ~25-30ms
>>> to 1300+ms.  I've increase new gen and that helps, but still this is
>>> suprising to me, especially since 1.0 defaults to the
>>> SerializingCacheProvider – off heap.
>>> The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50
>>> bytes per column value.  The cache only must be about 400 rows to catch all
>>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>>> Thoughts?
>>
>> You're problem is the mix of wide rows and the serializing cache.
>> What happens with the serializing cache is that our data is stored
>> out of the heap. But that means that for each read to a row, we
>> 'deserialize' the row for the out-of-heap memory into the heap to
>> return it. The thing is, when we do that, we do the full row each
>> time. In other word, for each query we deserialize 70k+ columns
>> even if to return only one. I'm willing to bet this is what is killing
>> your response time. If you want to cache wide rows, I really
>> suggest you're using the ConcurrentLinkedHashCacheProvider
>> instead.
>
> What happens when using ConcurrentLinkedHashCache? What is the
> implementation like and why is it better?

With ConcurrentLinkedHashCache, the cache is in the heap. So there
is no deserialization/copy during gets, so having wide rows is not a
problem. Outside of the fact that if you're enabling cache on a column
family with wide rows, you have to keep in mind that we always keep
full rows in cache.

>
>>
>> I'll also note that this explain the ParNew times too. Deserializing
>> all those columns from off-heap creates lots of short-lived object,
>> and since you deserialize 70k+ on each query, that's quite some
>> pressure on the new gen. Note that the serializing cache is
>> actually minimizing the use of old gen, because that is the one
>> that is the one that can create huge GC pauses with big heap,
>> but it actually put more pressure on the new gen. This is by
>> design and because new gen is much less of a problem than
>> old gen.
>
> In this scenario would it help if Young generation space is increased?

That's a hard one to answer because GC tuning is a bit of a black
art, when testing and benchmarking is often key. Having a bigger
young generation means having young collection kicked less often
but on the other side it reduces the size for the old generation.
But again, I don't think the problem is really the GC here, at least not
primarily.

--
Sylvain

>
>>
>> --
>> Sylvain
>>
>

Re: ParNew and caching

Posted by Mohit Anchlia <mo...@gmail.com>.
On Fri, Nov 18, 2011 at 6:39 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com> wrote:
>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>> 200-300ms.  This really screws with response times, which jump from ~25-30ms
>> to 1300+ms.  I've increase new gen and that helps, but still this is
>> suprising to me, especially since 1.0 defaults to the
>> SerializingCacheProvider – off heap.
>> The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50
>> bytes per column value.  The cache only must be about 400 rows to catch all
>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>> Thoughts?
>
> You're problem is the mix of wide rows and the serializing cache.
> What happens with the serializing cache is that our data is stored
> out of the heap. But that means that for each read to a row, we
> 'deserialize' the row for the out-of-heap memory into the heap to
> return it. The thing is, when we do that, we do the full row each
> time. In other word, for each query we deserialize 70k+ columns
> even if to return only one. I'm willing to bet this is what is killing
> your response time. If you want to cache wide rows, I really
> suggest you're using the ConcurrentLinkedHashCacheProvider
> instead.

What happens when using ConcurrentLinkedHashCache? What is the
implementation like and why is it better?

>
> I'll also note that this explain the ParNew times too. Deserializing
> all those columns from off-heap creates lots of short-lived object,
> and since you deserialize 70k+ on each query, that's quite some
> pressure on the new gen. Note that the serializing cache is
> actually minimizing the use of old gen, because that is the one
> that is the one that can create huge GC pauses with big heap,
> but it actually put more pressure on the new gen. This is by
> design and because new gen is much less of a problem than
> old gen.

In this scenario would it help if Young generation space is increased?

>
> --
> Sylvain
>

Re: ParNew and caching

Posted by Mohit Anchlia <mo...@gmail.com>.
2011/11/18 Todd Burruss <bb...@expedia.com>:
> After re-reading my post, what I meant to say is that I switched from
> Serializing cache provider to ConcurrentLinkedHash cache provider and then
> saw better performance, but still far worse than no caching at all:
>
> - no caching at all : 25-30ms
> - with Serializing provider : 1300+ms
> - with Concurrent provider : 500ms
>
> 100% cache hit rate.  ParNew is the only stat that I see out of line, so
> seems like still a lot of copying

Difficult to tell with the given info.

Paste your cfshistograms for that time and also snippet of GC logs
including ParNew and other major phases recorded in the logs.

Are there any significant writes, memtable flushes etc occuring during
this time? How many read/sec and writes/sec?

What's the size of your row and columns that you are trying to retrieve?


>
> On 11/18/11 2:40 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:
>
>>On Fri, Nov 18, 2011 at 1:46 PM, Todd Burruss <bb...@expedia.com>
>>wrote:
>>> Ok, I figured something like that.  Switching to
>>> ConcurrentLinkedHashCacheProvider I see it is a lot better, but still
>>> instead of the 25-30ms response times I enjoyed with no caching, I'm
>>> seeing 500ms at 100% hit rate on the cache.  No old gen pressure at all,
>>> just ParNew crazy.
>>>
>>
>>Are you saying that when you had off heap you saw better performance
>>of 25-30 ms? And now it's 500ms to get 50 columns? What kind of load
>>are you generating? What results do you see if you disabled row cache
>>and just leave key cache on?
>>
>>There are lot of factors to consider so having more stats would be
>>helpful.
>>
>>Please paste cfhistograms. Have you tried monitoring tpstats and netstats?
>>
>>What's your CL and RF?
>>> More info on my use case is that I am picking 50 columns from the 70k.
>>> Since the whole row is in the cache, and no copying from off-heap nor
>>>disk
>>> buffers, seems like it should be faster than non-cache mode.
>>>
>>> More thoughts :)
>>>
>>> On 11/18/11 6:39 AM, "Sylvain Lebresne" <sy...@datastax.com> wrote:
>>>
>>>>On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com>
>>>>wrote:
>>>>> I'm using cassandra 1.0.  Been doing some testing on using cass's
>>>>>cache.
>>>>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>>>>> 200-300ms.  This really screws with response times, which jump from
>>>>>~25-30ms
>>>>> to 1300+ms.  I've increase new gen and that helps, but still this is
>>>>> suprising to me, especially since 1.0 defaults to the
>>>>> SerializingCacheProvider ­ off heap.
>>>>> The interesting tid bit is that I have wide rows.  70k+ columns per
>>>>>row, ~50
>>>>> bytes per column value.  The cache only must be about 400 rows to
>>>>>catch
>>>>>all
>>>>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>>>>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>>>>> Thoughts?
>>>>
>>>>You're problem is the mix of wide rows and the serializing cache.
>>>>What happens with the serializing cache is that our data is stored
>>>>out of the heap. But that means that for each read to a row, we
>>>>'deserialize' the row for the out-of-heap memory into the heap to
>>>>return it. The thing is, when we do that, we do the full row each
>>>>time. In other word, for each query we deserialize 70k+ columns
>>>>even if to return only one. I'm willing to bet this is what is killing
>>>>your response time. If you want to cache wide rows, I really
>>>>suggest you're using the ConcurrentLinkedHashCacheProvider
>>>>instead.
>>>>
>>>>I'll also note that this explain the ParNew times too. Deserializing
>>>>all those columns from off-heap creates lots of short-lived object,
>>>>and since you deserialize 70k+ on each query, that's quite some
>>>>pressure on the new gen. Note that the serializing cache is
>>>>actually minimizing the use of old gen, because that is the one
>>>>that is the one that can create huge GC pauses with big heap,
>>>>but it actually put more pressure on the new gen. This is by
>>>>design and because new gen is much less of a problem than
>>>>old gen.
>>>>
>>>>--
>>>>Sylvain
>>>
>>>
>
>

Re: ParNew and caching

Posted by Edward Capriolo <ed...@gmail.com>.
I am not sure if there is a ticket on this but I have always thought the
row cache should not bother caching an entry bigger then n columns.

Murmurs of a slice cache might help as well.

On Saturday, December 10, 2011, Peter Schuller <pe...@infidyne.com>
wrote:
>> After re-reading my post, what I meant to say is that I switched from
>> Serializing cache provider to ConcurrentLinkedHash cache provider and
then
>> saw better performance, but still far worse than no caching at all:
>>
>> - no caching at all : 25-30ms
>> - with Serializing provider : 1300+ms
>> - with Concurrent provider : 500ms
>>
>> 100% cache hit rate.  ParNew is the only stat that I see out of line, so
>> seems like still a lot of copying
>
> In general, if you want to get to the bottom of this stuff and you
> think GC is involved, always run with -XX:+PrintGC -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps so that the GC activity
> can be observed.
>
> 1300+ should not be GC unless you are having fallbacks to full GC:s
> (would be possible to see with gc logging) and it should definitely be
> possible to avoid full gc:s being extremely common (but eliminating
> them entirely may not be possible).
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: ParNew and caching

Posted by Peter Schuller <pe...@infidyne.com>.
> After re-reading my post, what I meant to say is that I switched from
> Serializing cache provider to ConcurrentLinkedHash cache provider and then
> saw better performance, but still far worse than no caching at all:
>
> - no caching at all : 25-30ms
> - with Serializing provider : 1300+ms
> - with Concurrent provider : 500ms
>
> 100% cache hit rate.  ParNew is the only stat that I see out of line, so
> seems like still a lot of copying

In general, if you want to get to the bottom of this stuff and you
think GC is involved, always run with -XX:+PrintGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps so that the GC activity
can be observed.

1300+ should not be GC unless you are having fallbacks to full GC:s
(would be possible to see with gc logging) and it should definitely be
possible to avoid full gc:s being extremely common (but eliminating
them entirely may not be possible).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: ParNew and caching

Posted by Todd Burruss <bb...@expedia.com>.
After re-reading my post, what I meant to say is that I switched from
Serializing cache provider to ConcurrentLinkedHash cache provider and then
saw better performance, but still far worse than no caching at all:

- no caching at all : 25-30ms
- with Serializing provider : 1300+ms
- with Concurrent provider : 500ms

100% cache hit rate.  ParNew is the only stat that I see out of line, so
seems like still a lot of copying

On 11/18/11 2:40 PM, "Mohit Anchlia" <mo...@gmail.com> wrote:

>On Fri, Nov 18, 2011 at 1:46 PM, Todd Burruss <bb...@expedia.com>
>wrote:
>> Ok, I figured something like that.  Switching to
>> ConcurrentLinkedHashCacheProvider I see it is a lot better, but still
>> instead of the 25-30ms response times I enjoyed with no caching, I'm
>> seeing 500ms at 100% hit rate on the cache.  No old gen pressure at all,
>> just ParNew crazy.
>>
>
>Are you saying that when you had off heap you saw better performance
>of 25-30 ms? And now it's 500ms to get 50 columns? What kind of load
>are you generating? What results do you see if you disabled row cache
>and just leave key cache on?
>
>There are lot of factors to consider so having more stats would be
>helpful.
>
>Please paste cfhistograms. Have you tried monitoring tpstats and netstats?
>
>What's your CL and RF?
>> More info on my use case is that I am picking 50 columns from the 70k.
>> Since the whole row is in the cache, and no copying from off-heap nor
>>disk
>> buffers, seems like it should be faster than non-cache mode.
>>
>> More thoughts :)
>>
>> On 11/18/11 6:39 AM, "Sylvain Lebresne" <sy...@datastax.com> wrote:
>>
>>>On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com>
>>>wrote:
>>>> I'm using cassandra 1.0.  Been doing some testing on using cass's
>>>>cache.
>>>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>>>> 200-300ms.  This really screws with response times, which jump from
>>>>~25-30ms
>>>> to 1300+ms.  I've increase new gen and that helps, but still this is
>>>> suprising to me, especially since 1.0 defaults to the
>>>> SerializingCacheProvider ­ off heap.
>>>> The interesting tid bit is that I have wide rows.  70k+ columns per
>>>>row, ~50
>>>> bytes per column value.  The cache only must be about 400 rows to
>>>>catch
>>>>all
>>>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>>>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>>>> Thoughts?
>>>
>>>You're problem is the mix of wide rows and the serializing cache.
>>>What happens with the serializing cache is that our data is stored
>>>out of the heap. But that means that for each read to a row, we
>>>'deserialize' the row for the out-of-heap memory into the heap to
>>>return it. The thing is, when we do that, we do the full row each
>>>time. In other word, for each query we deserialize 70k+ columns
>>>even if to return only one. I'm willing to bet this is what is killing
>>>your response time. If you want to cache wide rows, I really
>>>suggest you're using the ConcurrentLinkedHashCacheProvider
>>>instead.
>>>
>>>I'll also note that this explain the ParNew times too. Deserializing
>>>all those columns from off-heap creates lots of short-lived object,
>>>and since you deserialize 70k+ on each query, that's quite some
>>>pressure on the new gen. Note that the serializing cache is
>>>actually minimizing the use of old gen, because that is the one
>>>that is the one that can create huge GC pauses with big heap,
>>>but it actually put more pressure on the new gen. This is by
>>>design and because new gen is much less of a problem than
>>>old gen.
>>>
>>>--
>>>Sylvain
>>
>>


Re: ParNew and caching

Posted by Mohit Anchlia <mo...@gmail.com>.
On Fri, Nov 18, 2011 at 1:46 PM, Todd Burruss <bb...@expedia.com> wrote:
> Ok, I figured something like that.  Switching to
> ConcurrentLinkedHashCacheProvider I see it is a lot better, but still
> instead of the 25-30ms response times I enjoyed with no caching, I'm
> seeing 500ms at 100% hit rate on the cache.  No old gen pressure at all,
> just ParNew crazy.
>

Are you saying that when you had off heap you saw better performance
of 25-30 ms? And now it's 500ms to get 50 columns? What kind of load
are you generating? What results do you see if you disabled row cache
and just leave key cache on?

There are lot of factors to consider so having more stats would be helpful.

Please paste cfhistograms. Have you tried monitoring tpstats and netstats?

What's your CL and RF?
> More info on my use case is that I am picking 50 columns from the 70k.
> Since the whole row is in the cache, and no copying from off-heap nor disk
> buffers, seems like it should be faster than non-cache mode.
>
> More thoughts :)
>
> On 11/18/11 6:39 AM, "Sylvain Lebresne" <sy...@datastax.com> wrote:
>
>>On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com>
>>wrote:
>>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>>> 200-300ms.  This really screws with response times, which jump from
>>>~25-30ms
>>> to 1300+ms.  I've increase new gen and that helps, but still this is
>>> suprising to me, especially since 1.0 defaults to the
>>> SerializingCacheProvider ­ off heap.
>>> The interesting tid bit is that I have wide rows.  70k+ columns per
>>>row, ~50
>>> bytes per column value.  The cache only must be about 400 rows to catch
>>>all
>>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>>> Thoughts?
>>
>>You're problem is the mix of wide rows and the serializing cache.
>>What happens with the serializing cache is that our data is stored
>>out of the heap. But that means that for each read to a row, we
>>'deserialize' the row for the out-of-heap memory into the heap to
>>return it. The thing is, when we do that, we do the full row each
>>time. In other word, for each query we deserialize 70k+ columns
>>even if to return only one. I'm willing to bet this is what is killing
>>your response time. If you want to cache wide rows, I really
>>suggest you're using the ConcurrentLinkedHashCacheProvider
>>instead.
>>
>>I'll also note that this explain the ParNew times too. Deserializing
>>all those columns from off-heap creates lots of short-lived object,
>>and since you deserialize 70k+ on each query, that's quite some
>>pressure on the new gen. Note that the serializing cache is
>>actually minimizing the use of old gen, because that is the one
>>that is the one that can create huge GC pauses with big heap,
>>but it actually put more pressure on the new gen. This is by
>>design and because new gen is much less of a problem than
>>old gen.
>>
>>--
>>Sylvain
>
>

Re: ParNew and caching

Posted by Jonathan Ellis <jb...@gmail.com>.
No i/o?  No sstable counts going up in cfhistograms?

Is the heap so full you're experiencing GC pressure that way?

On Fri, Nov 18, 2011 at 3:46 PM, Todd Burruss <bb...@expedia.com> wrote:
> Ok, I figured something like that.  Switching to
> ConcurrentLinkedHashCacheProvider I see it is a lot better, but still
> instead of the 25-30ms response times I enjoyed with no caching, I'm
> seeing 500ms at 100% hit rate on the cache.  No old gen pressure at all,
> just ParNew crazy.
>
> More info on my use case is that I am picking 50 columns from the 70k.
> Since the whole row is in the cache, and no copying from off-heap nor disk
> buffers, seems like it should be faster than non-cache mode.
>
> More thoughts :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: ParNew and caching

Posted by Todd Burruss <bb...@expedia.com>.
Ok, I figured something like that.  Switching to
ConcurrentLinkedHashCacheProvider I see it is a lot better, but still
instead of the 25-30ms response times I enjoyed with no caching, I'm
seeing 500ms at 100% hit rate on the cache.  No old gen pressure at all,
just ParNew crazy.

More info on my use case is that I am picking 50 columns from the 70k.
Since the whole row is in the cache, and no copying from off-heap nor disk
buffers, seems like it should be faster than non-cache mode.

More thoughts :)

On 11/18/11 6:39 AM, "Sylvain Lebresne" <sy...@datastax.com> wrote:

>On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com>
>wrote:
>> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
>> 200-300ms.  This really screws with response times, which jump from
>>~25-30ms
>> to 1300+ms.  I've increase new gen and that helps, but still this is
>> suprising to me, especially since 1.0 defaults to the
>> SerializingCacheProvider ­ off heap.
>> The interesting tid bit is that I have wide rows.  70k+ columns per
>>row, ~50
>> bytes per column value.  The cache only must be about 400 rows to catch
>>all
>> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
>> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
>> Thoughts?
>
>You're problem is the mix of wide rows and the serializing cache.
>What happens with the serializing cache is that our data is stored
>out of the heap. But that means that for each read to a row, we
>'deserialize' the row for the out-of-heap memory into the heap to
>return it. The thing is, when we do that, we do the full row each
>time. In other word, for each query we deserialize 70k+ columns
>even if to return only one. I'm willing to bet this is what is killing
>your response time. If you want to cache wide rows, I really
>suggest you're using the ConcurrentLinkedHashCacheProvider
>instead.
>
>I'll also note that this explain the ParNew times too. Deserializing
>all those columns from off-heap creates lots of short-lived object,
>and since you deserialize 70k+ on each query, that's quite some
>pressure on the new gen. Note that the serializing cache is
>actually minimizing the use of old gen, because that is the one
>that is the one that can create huge GC pauses with big heap,
>but it actually put more pressure on the new gen. This is by
>design and because new gen is much less of a problem than
>old gen.
>
>--
>Sylvain


Re: ParNew and caching

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Fri, Nov 18, 2011 at 1:53 AM, Todd Burruss <bb...@expedia.com> wrote:
> I'm using cassandra 1.0.  Been doing some testing on using cass's cache.
>  When I turn it on (using the CLI) I see ParNew jump from 3-4ms to
> 200-300ms.  This really screws with response times, which jump from ~25-30ms
> to 1300+ms.  I've increase new gen and that helps, but still this is
> suprising to me, especially since 1.0 defaults to the
> SerializingCacheProvider – off heap.
> The interesting tid bit is that I have wide rows.  70k+ columns per row, ~50
> bytes per column value.  The cache only must be about 400 rows to catch all
> the data per node and JMX is reporting 100% cache hits.  Nodetool ring
> reports < 2gb per node, my heap is 6gb and total RAM is 16gb.
> Thoughts?

You're problem is the mix of wide rows and the serializing cache.
What happens with the serializing cache is that our data is stored
out of the heap. But that means that for each read to a row, we
'deserialize' the row for the out-of-heap memory into the heap to
return it. The thing is, when we do that, we do the full row each
time. In other word, for each query we deserialize 70k+ columns
even if to return only one. I'm willing to bet this is what is killing
your response time. If you want to cache wide rows, I really
suggest you're using the ConcurrentLinkedHashCacheProvider
instead.

I'll also note that this explain the ParNew times too. Deserializing
all those columns from off-heap creates lots of short-lived object,
and since you deserialize 70k+ on each query, that's quite some
pressure on the new gen. Note that the serializing cache is
actually minimizing the use of old gen, because that is the one
that is the one that can create huge GC pauses with big heap,
but it actually put more pressure on the new gen. This is by
design and because new gen is much less of a problem than
old gen.

--
Sylvain