You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by James Golick <ja...@gmail.com> on 2010/03/31 05:47:48 UTC

Read Performance

We are starting to use cassandra to power our activity feed. The way we
organize our data is simple. "Event"s live in a CF called Events and are
keyed by a UUID. The timelines themselves live in a CF called Timelines,
which is keyed by user id (i.e. "1229") and contains a event uuids as column
names (sorted by TimeUUIDType).

To load a feed, we get a slice of the timeline CF for that user, then
multiget all of the corresponding events.

Loading the slice of the timeline is reasonably fast at 4-6ms. But,
multigetting the events is terribly slow - on the order of 35-100ms.

To alleviate the problem, we write events through to memcached and use a
memcached multiget in front of the cassandra multiget. We have enough cache
space to get upwards of a 99% hit rate, which makes loading the events
extremely fast, but it would be nice to make use of the 24GB of memory in
our cassandra nodes.

We're on 0.6, and I've enabled the row cache. It seems to have data in it,
but it's still slow.

So, am I doing something wrong, or is this the expected perf?

- James

Re: Read Performance

Posted by Ryan King <ry...@twitter.com>.

On Wed, Mar 31, 2010 at 9:04 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Can you redirect some of the reads from memcache to cassandra?  Sounds
> like the cache isn't getting warmed up.

Yeah, putting a cache in front of a cache can ruin the locality of the
second cache.

-ryan

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

Yes.

J.

Sent from my iPhone.

On 2010-04-01, at 9:21 PM, Brandon Williams <dr...@gmail.com> wrote:

> On Thu, Apr 1, 2010 at 9:37 PM, James Golick <ja...@gmail.com>  
> wrote:
> Well, folks, I'm feeling a little stupid right now (adding to the  
> injury inflicted by one Mr. Stump :-P).
>
> So, here's the story. The cache hit rate is up around 97% now. The  
> ruby code is down to around 20-25ms to multiget the 20 rows. I did  
> some profiling, though, and realized that a lot of time was being  
> spent in thrift. Turns out, that's where pretty much all the time  
> was going.
>
> I just ran the same test using java (scala) and the load is taking  
> around 2-4ms.
>
> That's with the binary accelerated thrift for ruby?
>
> -Brandon

Re: Read Performance

Posted by Brandon Williams <dr...@gmail.com>.

On Thu, Apr 1, 2010 at 9:37 PM, James Golick <ja...@gmail.com> wrote:

> Well, folks, I'm feeling a little stupid right now (adding to the injury
> inflicted by one Mr. Stump :-P).
>
> So, here's the story. The cache hit rate is up around 97% now. The ruby
> code is down to around 20-25ms to multiget the 20 rows. I did some
> profiling, though, and realized that a lot of time was being spent in
> thrift. Turns out, that's where pretty much all the time was going.
>
> I just ran the same test using java (scala) and the load is taking around
> 2-4ms.
>

That's with the binary accelerated thrift for ruby?

-Brandon

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

Yes.

On Fri, Apr 2, 2010 at 10:35 AM, Ryan King <ry...@twitter.com> wrote:

> On Thu, Apr 1, 2010 at 8:37 PM, James Golick <ja...@gmail.com>
> wrote:
> > Well, folks, I'm feeling a little stupid right now (adding to the injury
> > inflicted by one Mr. Stump :-P).
> > So, here's the story. The cache hit rate is up around 97% now. The ruby
> code
> > is down to around 20-25ms to multiget the 20 rows. I did some profiling,
> > though, and realized that a lot of time was being spent in thrift. Turns
> > out, that's where pretty much all the time was going.
> > I just ran the same test using java (scala) and the load is taking around
> > 2-4ms.
>
> We've definitely seen ruby add latency to cassandra operations, but it
> hasn't been that bad. Are you using our cassandra gem?
>
> -ryan
>

Re: Read Performance

Posted by Ryan King <ry...@twitter.com>.

On Thu, Apr 1, 2010 at 8:37 PM, James Golick <ja...@gmail.com> wrote:
> Well, folks, I'm feeling a little stupid right now (adding to the injury
> inflicted by one Mr. Stump :-P).
> So, here's the story. The cache hit rate is up around 97% now. The ruby code
> is down to around 20-25ms to multiget the 20 rows. I did some profiling,
> though, and realized that a lot of time was being spent in thrift. Turns
> out, that's where pretty much all the time was going.
> I just ran the same test using java (scala) and the load is taking around
> 2-4ms.

We've definitely seen ruby add latency to cassandra operations, but it
hasn't been that bad. Are you using our cassandra gem?

-ryan

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

Well, folks, I'm feeling a little stupid right now (adding to the injury
inflicted by one Mr. Stump :-P).

So, here's the story. The cache hit rate is up around 97% now. The ruby code
is down to around 20-25ms to multiget the 20 rows. I did some profiling,
though, and realized that a lot of time was being spent in thrift. Turns
out, that's where pretty much all the time was going.

I just ran the same test using java (scala) and the load is taking around
2-4ms.

On Thu, Apr 1, 2010 at 4:37 PM, Peter Chang <pe...@gmail.com> wrote:

> pwned.
>
>
> On Thu, Apr 1, 2010 at 2:09 PM, James Golick <ja...@gmail.com>wrote:
>
>> Damnit!
>>
>>
>> On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck <jd...@gmail.com> wrote:
>>
>>> ....Or rackspace.  ;)
>>>
>>> On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump <jo...@joestump.net> wrote:
>>> > Taking our flamewar offline. :-D
>>> >
>>> > On Thu, Apr 1, 2010 at 1:36 PM, James Golick <ja...@gmail.com>
>>> wrote:
>>> >> I don't have the additional hardware to try to isolate this issue atm
>>> >
>>> > You'd be able to spin up hardware to isolate that issue on AWS. ;)
>>> >
>>> > --Joe
>>> >
>>>
>>
>>
>

Re: Read Performance

Posted by Peter Chang <pe...@gmail.com>.

pwned.

On Thu, Apr 1, 2010 at 2:09 PM, James Golick <ja...@gmail.com> wrote:

> Damnit!
>
>
> On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck <jd...@gmail.com> wrote:
>
>> ....Or rackspace.  ;)
>>
>> On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump <jo...@joestump.net> wrote:
>> > Taking our flamewar offline. :-D
>> >
>> > On Thu, Apr 1, 2010 at 1:36 PM, James Golick <ja...@gmail.com>
>> wrote:
>> >> I don't have the additional hardware to try to isolate this issue atm
>> >
>> > You'd be able to spin up hardware to isolate that issue on AWS. ;)
>> >
>> > --Joe
>> >
>>
>
>

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

Damnit!

On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck <jd...@gmail.com> wrote:

> ....Or rackspace.  ;)
>
> On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump <jo...@joestump.net> wrote:
> > Taking our flamewar offline. :-D
> >
> > On Thu, Apr 1, 2010 at 1:36 PM, James Golick <ja...@gmail.com>
> wrote:
> >> I don't have the additional hardware to try to isolate this issue atm
> >
> > You'd be able to spin up hardware to isolate that issue on AWS. ;)
> >
> > --Joe
> >
>

Re: Read Performance

Posted by Jeremy Dunck <jd...@gmail.com>.

....Or rackspace.  ;)

On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump <jo...@joestump.net> wrote:
> Taking our flamewar offline. :-D
>
> On Thu, Apr 1, 2010 at 1:36 PM, James Golick <ja...@gmail.com> wrote:
>> I don't have the additional hardware to try to isolate this issue atm
>
> You'd be able to spin up hardware to isolate that issue on AWS. ;)
>
> --Joe
>

Re: Read Performance

Posted by Joseph Stump <jo...@joestump.net>.

Taking our flamewar offline. :-D

On Thu, Apr 1, 2010 at 1:36 PM, James Golick <ja...@gmail.com> wrote:
> I don't have the additional hardware to try to isolate this issue atm

You'd be able to spin up hardware to isolate that issue on AWS. ;)

--Joe

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

I don't have the additional hardware to try to isolate this issue atm, so I
decided to push some code that performs 20% of reads directly from
cassandra. The cache hit rate has gone up to about 88% now and it's still
climbing, albeit slowly. There remains plenty of free cache space.

So far, the average time to multi_get those 20 rows is still hovering around
35-45ms.

I'll report back with more info as it comes in.

On Thu, Apr 1, 2010 at 12:06 AM, Cemal Dalar <ce...@gmail.com> wrote:

> Hi James,
>
> I don't know how to get the below statistics data and calculate the access
> times (read/write in ms) in your previous mails. Can you explain a little?
> Iike to work on it also.
>
> CD
>
>
> On Thu, Apr 1, 2010 at 4:15 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> On Wed, Mar 31, 2010 at 6:21 PM, James Golick <ja...@gmail.com>
>> wrote:
>> > Keyspace: ActivityFeed
>> >         Read Count: 699443
>> >         Read Latency: 16.11017477192566 ms.
>>
>> >                 Column Family: Events
>> >                 Read Count: 232378
>> >                 Read Latency: 0.396 ms.
>> >                 Row cache capacity: 500000
>> >                 Row cache size: 62768
>> >                 Row cache hit rate: 0.007716049382716049
>>
>> This says that
>>
>>  - recent queries to Events are much faster than the lifetime average
>> for your Keyspace
>>  - even though you have almost no row cache hits (~1700 out of 232000
>> reads)
>>
>> Not sure what to make of that, tbh.  If it were me I would try to
>> reproduce on a test machine w/o all that pesky live traffic confusing
>> things.
>>
>> -Jonathan
>>
>
>

Re: Read Performance

Posted by Cemal Dalar <ce...@gmail.com>.

Hi James,

I don't know how to get the below statistics data and calculate the access
times (read/write in ms) in your previous mails. Can you explain a little?
Iike to work on it also.

CD

On Thu, Apr 1, 2010 at 4:15 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Wed, Mar 31, 2010 at 6:21 PM, James Golick <ja...@gmail.com>
> wrote:
> > Keyspace: ActivityFeed
> >         Read Count: 699443
> >         Read Latency: 16.11017477192566 ms.
>
> >                 Column Family: Events
> >                 Read Count: 232378
> >                 Read Latency: 0.396 ms.
> >                 Row cache capacity: 500000
> >                 Row cache size: 62768
> >                 Row cache hit rate: 0.007716049382716049
>
> This says that
>
>  - recent queries to Events are much faster than the lifetime average
> for your Keyspace
>  - even though you have almost no row cache hits (~1700 out of 232000
> reads)
>
> Not sure what to make of that, tbh.  If it were me I would try to
> reproduce on a test machine w/o all that pesky live traffic confusing
> things.
>
> -Jonathan
>

Re: Read Performance

Posted by Jonathan Ellis <jb...@gmail.com>.

On Wed, Mar 31, 2010 at 6:21 PM, James Golick <ja...@gmail.com> wrote:
> Keyspace: ActivityFeed
>         Read Count: 699443
>         Read Latency: 16.11017477192566 ms.

>                 Column Family: Events
>                 Read Count: 232378
>                 Read Latency: 0.396 ms.
>                 Row cache capacity: 500000
>                 Row cache size: 62768
>                 Row cache hit rate: 0.007716049382716049

This says that

 - recent queries to Events are much faster than the lifetime average
for your Keyspace
 - even though you have almost no row cache hits (~1700 out of 232000 reads)

Not sure what to make of that, tbh.  If it were me I would try to
reproduce on a test machine w/o all that pesky live traffic confusing
things.

-Jonathan

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

Keyspace: ActivityFeed
        Read Count: 699443
        Read Latency: 16.11017477192566 ms.
        Write Count: 69264920
        Write Latency: 0.020393242755495856 ms.
        Pending Tasks: 0
...snip....

                Column Family: Events
                SSTable count: 5
                Space used (live): 680625289
                Space used (total): 680625289
                Memtable Columns Count: 65974
                Memtable Data Size: 6901772
                Memtable Switch Count: 121
                Read Count: 232378
                Read Latency: 0.396 ms.
                Write Count: 919233
                Write Latency: 0.055 ms.
                Pending Tasks: 0
                Key cache capacity: 47
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache capacity: 500000
                Row cache size: 62768
                Row cache hit rate: 0.007716049382716049


On Wed, Mar 31, 2010 at 4:15 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> What does the CFS mbean think read latencies are?  Possibly something
> else is introducing latency after the read.
>
> On Wed, Mar 31, 2010 at 5:37 PM, James Golick <ja...@gmail.com>
> wrote:
> > Standard CF. 10 columns per row. Between about 800 bytes and 2k total per
> > row.
> > On Wed, Mar 31, 2010 at 3:06 PM, Chris Goffinet <go...@digg.com>
> wrote:
> >>
> >> How many columns in each row?
> >> -Chris
> >> On Mar 31, 2010, at 2:54 PM, James Golick wrote:
> >>
> >> I just tried running the same multi_get against cassandra 1000 times,
> >> assuming that that'd force it in to cache.
> >> I'm definitely seeing a 5-10ms improvement, but it's still looking like
> >> 20-30ms on average. Would you expect it to be faster than that?
> >> - James
> >>
> >> On Wed, Mar 31, 2010 at 11:44 AM, Jonathan Ellis <jb...@gmail.com>
> >> wrote:
> >>>
> >>> But then you'd still be caching the same things memcached is, so
> >>> unless you have a lot more ram you'll presumably miss the same rows
> >>> too.
> >>>
> >>> The only 2-layer approach that makes sense to me would be to have
> >>> cassandra keys cache at 100% behind memcached for the actual rows,
> >>> which will actually reduce the penalty for a memcache miss by
> >>> half-ish.
> >>>
> >>> On Wed, Mar 31, 2010 at 1:32 PM, David Strauss <david@fourkitchens.com
> >
> >>> wrote:
> >>> > Or, if faking memcached misses is too high a price to pay, queue some
> >>> > proportion of the reads to replay asynchronously against Cassandra.
> >>> >
> >>> > On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
> >>> >> Can you redirect some of the reads from memcache to cassandra?
>  Sounds
> >>> >> like the cache isn't getting warmed up.
> >>> >>
> >>> >> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <
> jamesgolick@gmail.com>
> >>> >> wrote:
> >>> >> > I'm testing on the live cluster, but most of the production reads
> >>> >> > are being
> >>> >> > served by the cache. It's definitely the right CF.
> >>> >> >
> >>> >> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <
> jbellis@gmail.com>
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick
> >>> >> >> <ja...@gmail.com>
> >>> >> >> wrote:
> >>> >> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5,
> 95.6,
> >>> >> >> > and
> >>> >> >> > NaN.
> >>> >> >> > Seems like that stat is a little broken.
> >>> >> >>
> >>> >> >> Sounds like you aren't getting enough requests for the
> >>> >> >> getRecentHitRate to make sense.  use getHits / getRequests.
> >>> >> >>
> >>> >> >> But if you aren't getting enough requests for getRecentHitRate,
> are
> >>> >> >> you sure you're tuning the cache on the right CF for your 35ms
> >>> >> >> test?
> >>> >> >> Are you testing live?  If not, what's your methodology here?
> >>> >> >>
> >>> >> >> -Jonathan
> >>> >> >
> >>> >> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>
> >>
> >
> >
>

Re: Read Performance

Posted by Jonathan Ellis <jb...@gmail.com>.

What does the CFS mbean think read latencies are?  Possibly something
else is introducing latency after the read.

On Wed, Mar 31, 2010 at 5:37 PM, James Golick <ja...@gmail.com> wrote:
> Standard CF. 10 columns per row. Between about 800 bytes and 2k total per
> row.
> On Wed, Mar 31, 2010 at 3:06 PM, Chris Goffinet <go...@digg.com> wrote:
>>
>> How many columns in each row?
>> -Chris
>> On Mar 31, 2010, at 2:54 PM, James Golick wrote:
>>
>> I just tried running the same multi_get against cassandra 1000 times,
>> assuming that that'd force it in to cache.
>> I'm definitely seeing a 5-10ms improvement, but it's still looking like
>> 20-30ms on average. Would you expect it to be faster than that?
>> - James
>>
>> On Wed, Mar 31, 2010 at 11:44 AM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>>>
>>> But then you'd still be caching the same things memcached is, so
>>> unless you have a lot more ram you'll presumably miss the same rows
>>> too.
>>>
>>> The only 2-layer approach that makes sense to me would be to have
>>> cassandra keys cache at 100% behind memcached for the actual rows,
>>> which will actually reduce the penalty for a memcache miss by
>>> half-ish.
>>>
>>> On Wed, Mar 31, 2010 at 1:32 PM, David Strauss <da...@fourkitchens.com>
>>> wrote:
>>> > Or, if faking memcached misses is too high a price to pay, queue some
>>> > proportion of the reads to replay asynchronously against Cassandra.
>>> >
>>> > On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
>>> >> Can you redirect some of the reads from memcache to cassandra?  Sounds
>>> >> like the cache isn't getting warmed up.
>>> >>
>>> >> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com>
>>> >> wrote:
>>> >> > I'm testing on the live cluster, but most of the production reads
>>> >> > are being
>>> >> > served by the cache. It's definitely the right CF.
>>> >> >
>>> >> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick
>>> >> >> <ja...@gmail.com>
>>> >> >> wrote:
>>> >> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6,
>>> >> >> > and
>>> >> >> > NaN.
>>> >> >> > Seems like that stat is a little broken.
>>> >> >>
>>> >> >> Sounds like you aren't getting enough requests for the
>>> >> >> getRecentHitRate to make sense.  use getHits / getRequests.
>>> >> >>
>>> >> >> But if you aren't getting enough requests for getRecentHitRate, are
>>> >> >> you sure you're tuning the cache on the right CF for your 35ms
>>> >> >> test?
>>> >> >> Are you testing live?  If not, what's your methodology here?
>>> >> >>
>>> >> >> -Jonathan
>>> >> >
>>> >> >
>>> >
>>> >
>>> >
>>> >
>>
>>
>
>

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

Standard CF. 10 columns per row. Between about 800 bytes and 2k total per
row.

On Wed, Mar 31, 2010 at 3:06 PM, Chris Goffinet <go...@digg.com> wrote:

> How many columns in each row?
>
> -Chris
>
> On Mar 31, 2010, at 2:54 PM, James Golick wrote:
>
> I just tried running the same multi_get against cassandra 1000 times,
> assuming that that'd force it in to cache.
>
> I'm definitely seeing a 5-10ms improvement, but it's still looking like
> 20-30ms on average. Would you expect it to be faster than that?
>
> - James
>
> On Wed, Mar 31, 2010 at 11:44 AM, Jonathan Ellis <jb...@gmail.com>wrote:
>
>> But then you'd still be caching the same things memcached is, so
>> unless you have a lot more ram you'll presumably miss the same rows
>> too.
>>
>> The only 2-layer approach that makes sense to me would be to have
>> cassandra keys cache at 100% behind memcached for the actual rows,
>> which will actually reduce the penalty for a memcache miss by
>> half-ish.
>>
>> On Wed, Mar 31, 2010 at 1:32 PM, David Strauss <da...@fourkitchens.com>
>> wrote:
>> > Or, if faking memcached misses is too high a price to pay, queue some
>> > proportion of the reads to replay asynchronously against Cassandra.
>> >
>> > On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
>> >> Can you redirect some of the reads from memcache to cassandra?  Sounds
>> >> like the cache isn't getting warmed up.
>> >>
>> >> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com>
>> wrote:
>> >> > I'm testing on the live cluster, but most of the production reads are
>> being
>> >> > served by the cache. It's definitely the right CF.
>> >> >
>> >> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >> >>
>> >> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick <
>> jamesgolick@gmail.com>
>> >> >> wrote:
>> >> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6,
>> and
>> >> >> > NaN.
>> >> >> > Seems like that stat is a little broken.
>> >> >>
>> >> >> Sounds like you aren't getting enough requests for the
>> >> >> getRecentHitRate to make sense.  use getHits / getRequests.
>> >> >>
>> >> >> But if you aren't getting enough requests for getRecentHitRate, are
>> >> >> you sure you're tuning the cache on the right CF for your 35ms test?
>> >> >> Are you testing live?  If not, what's your methodology here?
>> >> >>
>> >> >> -Jonathan
>> >> >
>> >> >
>> >
>> >
>> >
>> >
>>
>
>
>

Re: Read Performance

Posted by Chris Goffinet <go...@digg.com>.

How many columns in each row?

-Chris

On Mar 31, 2010, at 2:54 PM, James Golick wrote:

> I just tried running the same multi_get against cassandra 1000 times, assuming that that'd force it in to cache.
> 
> I'm definitely seeing a 5-10ms improvement, but it's still looking like 20-30ms on average. Would you expect it to be faster than that?
> 
> - James
> 
> On Wed, Mar 31, 2010 at 11:44 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> But then you'd still be caching the same things memcached is, so
> unless you have a lot more ram you'll presumably miss the same rows
> too.
> 
> The only 2-layer approach that makes sense to me would be to have
> cassandra keys cache at 100% behind memcached for the actual rows,
> which will actually reduce the penalty for a memcache miss by
> half-ish.
> 
> On Wed, Mar 31, 2010 at 1:32 PM, David Strauss <da...@fourkitchens.com> wrote:
> > Or, if faking memcached misses is too high a price to pay, queue some
> > proportion of the reads to replay asynchronously against Cassandra.
> >
> > On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
> >> Can you redirect some of the reads from memcache to cassandra?  Sounds
> >> like the cache isn't getting warmed up.
> >>
> >> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com> wrote:
> >> > I'm testing on the live cluster, but most of the production reads are being
> >> > served by the cache. It's definitely the right CF.
> >> >
> >> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> >> >>
> >> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick <ja...@gmail.com>
> >> >> wrote:
> >> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and
> >> >> > NaN.
> >> >> > Seems like that stat is a little broken.
> >> >>
> >> >> Sounds like you aren't getting enough requests for the
> >> >> getRecentHitRate to make sense.  use getHits / getRequests.
> >> >>
> >> >> But if you aren't getting enough requests for getRecentHitRate, are
> >> >> you sure you're tuning the cache on the right CF for your 35ms test?
> >> >> Are you testing live?  If not, what's your methodology here?
> >> >>
> >> >> -Jonathan
> >> >
> >> >
> >
> >
> >
> >
>

Re: Read Performance

Posted by Jonathan Ellis <jb...@gmail.com>.

Yes, I would.  How many columns are you reading per row?  How larger
are they?  Are they supercolumns?

On Wed, Mar 31, 2010 at 4:54 PM, James Golick <ja...@gmail.com> wrote:
> I just tried running the same multi_get against cassandra 1000 times,
> assuming that that'd force it in to cache.
> I'm definitely seeing a 5-10ms improvement, but it's still looking like
> 20-30ms on average. Would you expect it to be faster than that?
> - James
>
> On Wed, Mar 31, 2010 at 11:44 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> But then you'd still be caching the same things memcached is, so
>> unless you have a lot more ram you'll presumably miss the same rows
>> too.
>>
>> The only 2-layer approach that makes sense to me would be to have
>> cassandra keys cache at 100% behind memcached for the actual rows,
>> which will actually reduce the penalty for a memcache miss by
>> half-ish.
>>
>> On Wed, Mar 31, 2010 at 1:32 PM, David Strauss <da...@fourkitchens.com>
>> wrote:
>> > Or, if faking memcached misses is too high a price to pay, queue some
>> > proportion of the reads to replay asynchronously against Cassandra.
>> >
>> > On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
>> >> Can you redirect some of the reads from memcache to cassandra?  Sounds
>> >> like the cache isn't getting warmed up.
>> >>
>> >> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com>
>> >> wrote:
>> >> > I'm testing on the live cluster, but most of the production reads are
>> >> > being
>> >> > served by the cache. It's definitely the right CF.
>> >> >
>> >> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick
>> >> >> <ja...@gmail.com>
>> >> >> wrote:
>> >> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6,
>> >> >> > and
>> >> >> > NaN.
>> >> >> > Seems like that stat is a little broken.
>> >> >>
>> >> >> Sounds like you aren't getting enough requests for the
>> >> >> getRecentHitRate to make sense.  use getHits / getRequests.
>> >> >>
>> >> >> But if you aren't getting enough requests for getRecentHitRate, are
>> >> >> you sure you're tuning the cache on the right CF for your 35ms test?
>> >> >> Are you testing live?  If not, what's your methodology here?
>> >> >>
>> >> >> -Jonathan
>> >> >
>> >> >
>> >
>> >
>> >
>> >
>
>

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

I just tried running the same multi_get against cassandra 1000 times,
assuming that that'd force it in to cache.

I'm definitely seeing a 5-10ms improvement, but it's still looking like
20-30ms on average. Would you expect it to be faster than that?

- James

On Wed, Mar 31, 2010 at 11:44 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> But then you'd still be caching the same things memcached is, so
> unless you have a lot more ram you'll presumably miss the same rows
> too.
>
> The only 2-layer approach that makes sense to me would be to have
> cassandra keys cache at 100% behind memcached for the actual rows,
> which will actually reduce the penalty for a memcache miss by
> half-ish.
>
> On Wed, Mar 31, 2010 at 1:32 PM, David Strauss <da...@fourkitchens.com>
> wrote:
> > Or, if faking memcached misses is too high a price to pay, queue some
> > proportion of the reads to replay asynchronously against Cassandra.
> >
> > On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
> >> Can you redirect some of the reads from memcache to cassandra?  Sounds
> >> like the cache isn't getting warmed up.
> >>
> >> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com>
> wrote:
> >> > I'm testing on the live cluster, but most of the production reads are
> being
> >> > served by the cache. It's definitely the right CF.
> >> >
> >> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >> >>
> >> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick <
> jamesgolick@gmail.com>
> >> >> wrote:
> >> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6,
> and
> >> >> > NaN.
> >> >> > Seems like that stat is a little broken.
> >> >>
> >> >> Sounds like you aren't getting enough requests for the
> >> >> getRecentHitRate to make sense.  use getHits / getRequests.
> >> >>
> >> >> But if you aren't getting enough requests for getRecentHitRate, are
> >> >> you sure you're tuning the cache on the right CF for your 35ms test?
> >> >> Are you testing live?  If not, what's your methodology here?
> >> >>
> >> >> -Jonathan
> >> >
> >> >
> >
> >
> >
> >
>

Re: Read Performance

Posted by Jonathan Ellis <jb...@gmail.com>.

But then you'd still be caching the same things memcached is, so
unless you have a lot more ram you'll presumably miss the same rows
too.

The only 2-layer approach that makes sense to me would be to have
cassandra keys cache at 100% behind memcached for the actual rows,
which will actually reduce the penalty for a memcache miss by
half-ish.

On Wed, Mar 31, 2010 at 1:32 PM, David Strauss <da...@fourkitchens.com> wrote:
> Or, if faking memcached misses is too high a price to pay, queue some
> proportion of the reads to replay asynchronously against Cassandra.
>
> On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
>> Can you redirect some of the reads from memcache to cassandra?  Sounds
>> like the cache isn't getting warmed up.
>>
>> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com> wrote:
>> > I'm testing on the live cluster, but most of the production reads are being
>> > served by the cache. It's definitely the right CF.
>> >
>> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> >>
>> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick <ja...@gmail.com>
>> >> wrote:
>> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and
>> >> > NaN.
>> >> > Seems like that stat is a little broken.
>> >>
>> >> Sounds like you aren't getting enough requests for the
>> >> getRecentHitRate to make sense.  use getHits / getRequests.
>> >>
>> >> But if you aren't getting enough requests for getRecentHitRate, are
>> >> you sure you're tuning the cache on the right CF for your 35ms test?
>> >> Are you testing live?  If not, what's your methodology here?
>> >>
>> >> -Jonathan
>> >
>> >
>
>
>
>

Re: Read Performance

Posted by David Strauss <da...@fourkitchens.com>.

Or, if faking memcached misses is too high a price to pay, queue some
proportion of the reads to replay asynchronously against Cassandra.

On Wed, 2010-03-31 at 11:04 -0500, Jonathan Ellis wrote:
> Can you redirect some of the reads from memcache to cassandra?  Sounds
> like the cache isn't getting warmed up.
> 
> On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com> wrote:
> > I'm testing on the live cluster, but most of the production reads are being
> > served by the cache. It's definitely the right CF.
> >
> > On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> >>
> >> On Wed, Mar 31, 2010 at 12:01 AM, James Golick <ja...@gmail.com>
> >> wrote:
> >> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and
> >> > NaN.
> >> > Seems like that stat is a little broken.
> >>
> >> Sounds like you aren't getting enough requests for the
> >> getRecentHitRate to make sense.  use getHits / getRequests.
> >>
> >> But if you aren't getting enough requests for getRecentHitRate, are
> >> you sure you're tuning the cache on the right CF for your 35ms test?
> >> Are you testing live?  If not, what's your methodology here?
> >>
> >> -Jonathan
> >
> >

Re: Read Performance

Posted by Jonathan Ellis <jb...@gmail.com>.

Can you redirect some of the reads from memcache to cassandra?  Sounds
like the cache isn't getting warmed up.

On Wed, Mar 31, 2010 at 11:01 AM, James Golick <ja...@gmail.com> wrote:
> I'm testing on the live cluster, but most of the production reads are being
> served by the cache. It's definitely the right CF.
>
> On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> On Wed, Mar 31, 2010 at 12:01 AM, James Golick <ja...@gmail.com>
>> wrote:
>> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and
>> > NaN.
>> > Seems like that stat is a little broken.
>>
>> Sounds like you aren't getting enough requests for the
>> getRecentHitRate to make sense.  use getHits / getRequests.
>>
>> But if you aren't getting enough requests for getRecentHitRate, are
>> you sure you're tuning the cache on the right CF for your 35ms test?
>> Are you testing live?  If not, what's your methodology here?
>>
>> -Jonathan
>
>

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

I'm testing on the live cluster, but most of the production reads are being
served by the cache. It's definitely the right CF.

On Wed, Mar 31, 2010 at 8:30 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Wed, Mar 31, 2010 at 12:01 AM, James Golick <ja...@gmail.com>
> wrote:
> > Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and
> NaN.
> > Seems like that stat is a little broken.
>
> Sounds like you aren't getting enough requests for the
> getRecentHitRate to make sense.  use getHits / getRequests.
>
> But if you aren't getting enough requests for getRecentHitRate, are
> you sure you're tuning the cache on the right CF for your 35ms test?
> Are you testing live?  If not, what's your methodology here?
>
> -Jonathan
>

Re: Read Performance

Posted by Jonathan Ellis <jb...@gmail.com>.

On Wed, Mar 31, 2010 at 12:01 AM, James Golick <ja...@gmail.com> wrote:
> Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and NaN.
> Seems like that stat is a little broken.

Sounds like you aren't getting enough requests for the
getRecentHitRate to make sense.  use getHits / getRequests.

But if you aren't getting enough requests for getRecentHitRate, are
you sure you're tuning the cache on the right CF for your 35ms test?
Are you testing live?  If not, what's your methodology here?

-Jonathan

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

Okay, so now my row cache hit rate jumps between 1.0, 99.5, 95.6, and NaN.
Seems like that stat is a little broken.

Still seeing around 35ms to multiget 20 rows.

- James

On Tue, Mar 30, 2010 at 9:22 PM, Ryan King <ry...@twitter.com> wrote:

> On Tue, Mar 30, 2010 at 9:11 PM, James Golick <ja...@gmail.com>
> wrote:
> > No change observed. The hit rate fluctuates between 0.0, 0.3, and NaN
> every
> > time I run cfstats.
> > I just increased it by 10x. Hopefully that'll help.
>
> You should turn the caches up until you either run out of heap, or the
> hitrate stops going up.
>
> -ryan
>

Re: Read Performance

Posted by Ryan King <ry...@twitter.com>.

On Tue, Mar 30, 2010 at 9:11 PM, James Golick <ja...@gmail.com> wrote:
> No change observed. The hit rate fluctuates between 0.0, 0.3, and NaN every
> time I run cfstats.
> I just increased it by 10x. Hopefully that'll help.

You should turn the caches up until you either run out of heap, or the
hitrate stops going up.

-ryan

Re: Read Performance

Posted by James Golick <ja...@gmail.com>.

No change observed. The hit rate fluctuates between 0.0, 0.3, and NaN every
time I run cfstats.

I just increased it by 10x. Hopefully that'll help.

On Tue, Mar 30, 2010 at 8:59 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> What is your row cache hit rate?
>
> By "still slow" do you mean "no change observed" or "faster but not
> fast enough?"
>
> On Tue, Mar 30, 2010 at 10:47 PM, James Golick <ja...@gmail.com>
> wrote:
> > We are starting to use cassandra to power our activity feed. The way we
> > organize our data is simple. "Event"s live in a CF called Events and are
> > keyed by a UUID. The timelines themselves live in a CF called Timelines,
> > which is keyed by user id (i.e. "1229") and contains a event uuids as
> column
> > names (sorted by TimeUUIDType).
> > To load a feed, we get a slice of the timeline CF for that user, then
> > multiget all of the corresponding events.
> > Loading the slice of the timeline is reasonably fast at 4-6ms. But,
> > multigetting the events is terribly slow - on the order of 35-100ms.
> > To alleviate the problem, we write events through to memcached and use a
> > memcached multiget in front of the cassandra multiget. We have enough
> cache
> > space to get upwards of a 99% hit rate, which makes loading the events
> > extremely fast, but it would be nice to make use of the 24GB of memory in
> > our cassandra nodes.
> > We're on 0.6, and I've enabled the row cache. It seems to have data in
> it,
> > but it's still slow.
> > So, am I doing something wrong, or is this the expected perf?
> > - James
>

Re: Read Performance

Posted by Jonathan Ellis <jb...@gmail.com>.

What is your row cache hit rate?

By "still slow" do you mean "no change observed" or "faster but not
fast enough?"

On Tue, Mar 30, 2010 at 10:47 PM, James Golick <ja...@gmail.com> wrote:
> We are starting to use cassandra to power our activity feed. The way we
> organize our data is simple. "Event"s live in a CF called Events and are
> keyed by a UUID. The timelines themselves live in a CF called Timelines,
> which is keyed by user id (i.e. "1229") and contains a event uuids as column
> names (sorted by TimeUUIDType).
> To load a feed, we get a slice of the timeline CF for that user, then
> multiget all of the corresponding events.
> Loading the slice of the timeline is reasonably fast at 4-6ms. But,
> multigetting the events is terribly slow - on the order of 35-100ms.
> To alleviate the problem, we write events through to memcached and use a
> memcached multiget in front of the cassandra multiget. We have enough cache
> space to get upwards of a 99% hit rate, which makes loading the events
> extremely fast, but it would be nice to make use of the 24GB of memory in
> our cassandra nodes.
> We're on 0.6, and I've enabled the row cache. It seems to have data in it,
> but it's still slow.
> So, am I doing something wrong, or is this the expected perf?
> - James