You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Chris Burroughs <ch...@gmail.com> on 2011/01/11 15:54:29 UTC

Confused about CASSANDRA-1417; saving row cache

https://issues.apache.org/jira/browse/CASSANDRA-1417
http://www.riptano.com/blog/whats-new-cassandra-066

My naive reading of CASSANDRA-1417 was that it could be used to save the
row cache to disk.  Empirically it appears to only save the row keys,
and then reads each row.

In my case I set the row cache to save to disk.  This resulted in a 25
MB file. On restart the process sat at this line for about 1 hour while
reading at 25 MB a second: INFO [main] 2011-01-11 07:32:41,705
ColumnFamilyStore.java (line 252) loading row cache for FOO of BAR

Is this the intentional implementation?  Are there any reason not to
just the entire row to disk to allow for faster startup?

Re: Confused about CASSANDRA-1417; saving row cache

Posted by Peter Schuller <pe...@infidyne.com>.
> This makes total sense and is obvious in hindsight.  But wouldn't such a
> hypothetical "stale" row cache on be corrected by read repair (in other
> words useless for write heavy workloads, not a problem for read heavy)?

It's not quite that simple. For example, suppose you write to the
cluster at QUORUM consistency and then later read back at QUORUM. If
you have a node which was part of the set of nodes that ACK:ed the
write, but has now lost the data due to stale row cache, you're now
having a node participating in the read set towards QUORUM for your
read, even though it has forgotten the write it officially ACK:ed.
Cassandra would now be violating the consistency guarantees it
pretended to have.

(That is ignoring any potential issues directly resulting from a node
having an internally inconsistent state w.r.t. what's actually stored
on the node.)

-- 
/ Peter Schuller

Re: Confused about CASSANDRA-1417; saving row cache

Posted by Chris Burroughs <ch...@gmail.com>.
On 01/11/2011 10:11 AM, Edward Capriolo wrote:
> I think because the RowCache is only saved periodically it could be
> out of sync. IE saved at 12:00 changed at 12:01 then the row cache
> would consistently return the wrong results since it never looks at
> the disk again. I guess saving the row cache only makes sense for a
> smaller row cache at this point.

This makes total sense and is obvious in hindsight.  But wouldn't such a
hypothetical "stale" row cache on be corrected by read repair (in other
words useless for write heavy workloads, not a problem for read heavy)?

Re: Confused about CASSANDRA-1417; saving row cache

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Jan 11, 2011 at 9:54 AM, Chris Burroughs
<ch...@gmail.com> wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-1417
> http://www.riptano.com/blog/whats-new-cassandra-066
>
> My naive reading of CASSANDRA-1417 was that it could be used to save the
> row cache to disk.  Empirically it appears to only save the row keys,
> and then reads each row.
>
> In my case I set the row cache to save to disk.  This resulted in a 25
> MB file. On restart the process sat at this line for about 1 hour while
> reading at 25 MB a second: INFO [main] 2011-01-11 07:32:41,705
> ColumnFamilyStore.java (line 252) loading row cache for FOO of BAR
>
> Is this the intentional implementation?  Are there any reason not to
> just the entire row to disk to allow for faster startup?
>

I think because the RowCache is only saved periodically it could be
out of sync. IE saved at 12:00 changed at 12:01 then the row cache
would consistently return the wrong results since it never looks at
the disk again. I guess saving the row cache only makes sense for a
smaller row cache at this point.

Re: Confused about CASSANDRA-1417; saving row cache

Posted by Chris Burroughs <ch...@gmail.com>.
On 2011-01-11 15:41, Chris Burroughs wrote:
> On 01/11/2011 02:56 PM, Peter Schuller wrote:
>>> But now I need two knobs:  "Max size of row cache" (best optimal steady
>>> state hit rate) and "number of row cache items to read in on startup"
>>> (so that the ROW-READ-STAGE does not need to drop packets and node can
>>> be restarted in a reasonable amount of time).
>>
>> Good idea IMO. File a jira ticket?
>>
> 
> https://issues.apache.org/jira/browse/CASSANDRA-1966 created.
> 

Reading only a fraction of the rows is unlikely to be useful since
ConcurrentLinkedHashMap does not provide any hooks for sorting the
entries usefully.

For anyone else who is having similar problems with cold startup and
small rows what I will probably try a KeyCache sized << RowCache and
save only that KeyCache to disk.

Re: Confused about CASSANDRA-1417; saving row cache

Posted by Chris Burroughs <ch...@gmail.com>.
On 01/11/2011 02:56 PM, Peter Schuller wrote:
>> But now I need two knobs:  "Max size of row cache" (best optimal steady
>> state hit rate) and "number of row cache items to read in on startup"
>> (so that the ROW-READ-STAGE does not need to drop packets and node can
>> be restarted in a reasonable amount of time).
> 
> Good idea IMO. File a jira ticket?
> 

https://issues.apache.org/jira/browse/CASSANDRA-1966 created.


Re: Confused about CASSANDRA-1417; saving row cache

Posted by Peter Schuller <pe...@infidyne.com>.
> But now I need two knobs:  "Max size of row cache" (best optimal steady
> state hit rate) and "number of row cache items to read in on startup"
> (so that the ROW-READ-STAGE does not need to drop packets and node can
> be restarted in a reasonable amount of time).

Good idea IMO. File a jira ticket?

-- 
/ Peter Schuller

Re: Confused about CASSANDRA-1417; saving row cache

Posted by Chris Burroughs <ch...@gmail.com>.
On 01/11/2011 12:23 PM, Peter Schuller wrote:
>> Is this the intentional implementation?  Are there any reason not to
>> just the entire row to disk to allow for faster startup?
> 
> Intentional (in the sense of "not a mistake"), but see:
> 
>    https://issues.apache.org/jira/browse/CASSANDRA-1625
> 
> The reason your start-up took a lot of time is that reading in the
> values associated with the keys is entirely seek-bound (except in
> certain edge cases). Eliminating the need for seek-bound I/O to
> populate the row cache was the purpose of filing 1625.
> 

My reading of CASSANDRA-1625 is that the current proposal is to make the
"row cache" a CF and order CFs by hotness.   This sounds totally rad,
but not a near term change.


> In practice, you do have to consider the expected start-up time when
> sizing your row cache.

But now I need two knobs:  "Max size of row cache" (best optimal steady
state hit rate) and "number of row cache items to read in on startup"
(so that the ROW-READ-STAGE does not need to drop packets and node can
be restarted in a reasonable amount of time).

Choosing between a long period of dropped packets while the row cache
populates or 1 hour restart time per node is not fun.

Re: Confused about CASSANDRA-1417; saving row cache

Posted by Peter Schuller <pe...@infidyne.com>.
> https://issues.apache.org/jira/browse/CASSANDRA-1417

[snip, row cache saving only keys]

> Is this the intentional implementation?  Are there any reason not to
> just the entire row to disk to allow for faster startup?

Intentional (in the sense of "not a mistake"), but see:

   https://issues.apache.org/jira/browse/CASSANDRA-1625

The reason your start-up took a lot of time is that reading in the
values associated with the keys is entirely seek-bound (except in
certain edge cases). Eliminating the need for seek-bound I/O to
populate the row cache was the purpose of filing 1625.

In practice, you do have to consider the expected start-up time when
sizing your row cache.

-- 
/ Peter Schuller