You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by buddhasystem <po...@bnl.gov> on 2011/02/24 01:04:42 UTC

How come key cache increases speed by x4?

Well I know the cache is there for a reason, I just can't explain the factor
of 4 when I run my queries on a hot vs cold cache. My queries are actually a
chain of one on an inverted index, which produces a tuple of keys to be used
in the "main" query. The inverted index query should be downright trivial.

I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing
something? Why such a large factor?

TIA

Maxim

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-come-key-cache-increases-speed-by-x4-tp6058435p6058435.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: How come key cache increases speed by x4?

Posted by Robert Coli <rc...@digg.com>.
On Wed, Feb 23, 2011 at 4:04 PM, buddhasystem <po...@bnl.gov> wrote:

> Well I know the cache is there for a reason, I just can't explain the factor
> of 4 when I run my queries on a hot vs cold cache. My queries are actually a
> chain of one on an inverted index, which produces a tuple of keys to be used
> in the "main" query. The inverted index query should be downright trivial.
>
> I see the turnaround time per row go down to 1 ms from 4 ms. Am I missing
> something? Why such a large factor?

(simplified for discussion purposes, not necessarily exhaustive
description of.. )

Path in the cold key cache case :

a) check all bloom filters, 1 per sstable in the CF, which is in memory
b) read the index file (not in memory) and traverse index for every
sstable which returns positive in a)
c) read the actual data file once for every sstable

Path in the hot key cache case :

a) read list of filenames and offsets from key cache
b) read the actual data file

You will notice that the former involves a lot more seeking than the
latter, especially if you have "many" sstables. This seeking almost
certainly is the cause of your observed difference. If you graph I/O
throughput in the two different cases, you will almost certainly see
yourself doing more (slow) I/O in the cold cache case. Memory spent on
key cache is usually relatively well spent, for this reason.

=Rob