You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Arya Asemanfar <ar...@gmail.com> on 2010/08/11 05:43:58 UTC

Soliciting thoughts on possible read optimization

I mentioned this today to a couple folks at Cassandra Summit, and thought
I'd solicit some more thoughts here.

Currently, the read stage includes checking row cache. So if your concurrent
reads is N and you have N reads reading from disk, the next read will block
until a disk read finishes, even if it's in row cache. Would it make sense
to isolate disk reads from cache reads? To either make the read stage be
only used on misses, or to make 2 read stages CacheRead and DiskRead? Of
course, we'd have to go to DiskRead for mmap since we wouldn't know until we
asked the OS.

My thought is that stages should be based on resources rather than
semantics, but that may be wrong. Logically, I don't think it would make
sense to have the read stage bounded in a hypothetical system where there is
no IO; it's most likely because of the disk and subsequent IO contention
that that cap was introduced.

As a possible bonus with this change, you can make other optimizations like
batching row reads from disk where the keys were in key cache (does this
even make sense? I'm not too sure how that would work).

Let me know what you guys think.

Thanks,
Arya

Re: Soliciting thoughts on possible read optimization

Posted by Jonathan Ellis <jb...@gmail.com>.
https://issues.apache.org/jira/browse/CASSANDRA-1379

On Tue, Aug 10, 2010 at 8:43 PM, Arya Asemanfar <ar...@gmail.com> wrote:
> I mentioned this today to a couple folks at Cassandra Summit, and thought
> I'd solicit some more thoughts here.
> Currently, the read stage includes checking row cache. So if your concurrent
> reads is N and you have N reads reading from disk, the next read will block
> until a disk read finishes, even if it's in row cache. Would it make sense
> to isolate disk reads from cache reads? To either make the read stage be
> only used on misses, or to make 2 read stages CacheRead and DiskRead? Of
> course, we'd have to go to DiskRead for mmap since we wouldn't know until we
> asked the OS.
> My thought is that stages should be based on resources rather than
> semantics, but that may be wrong. Logically, I don't think it would make
> sense to have the read stage bounded in a hypothetical system where there is
> no IO; it's most likely because of the disk and subsequent IO contention
> that that cap was introduced.
> As a possible bonus with this change, you can make other optimizations like
> batching row reads from disk where the keys were in key cache (does this
> even make sense? I'm not too sure how that would work).
> Let me know what you guys think.
> Thanks,
> Arya



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Soliciting thoughts on possible read optimization

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Aug 11, 2010 at 11:37 AM, Ryan King <ry...@twitter.com> wrote:
> On Tue, Aug 10, 2010 at 8:43 PM, Arya Asemanfar <ar...@gmail.com> wrote:
>> I mentioned this today to a couple folks at Cassandra Summit, and thought
>> I'd solicit some more thoughts here.
>> Currently, the read stage includes checking row cache. So if your concurrent
>> reads is N and you have N reads reading from disk, the next read will block
>> until a disk read finishes, even if it's in row cache. Would it make sense
>> to isolate disk reads from cache reads? To either make the read stage be
>> only used on misses, or to make 2 read stages CacheRead and DiskRead? Of
>> course, we'd have to go to DiskRead for mmap since we wouldn't know until we
>> asked the OS.
>> My thought is that stages should be based on resources rather than
>> semantics, but that may be wrong. Logically, I don't think it would make
>> sense to have the read stage bounded in a hypothetical system where there is
>> no IO; it's most likely because of the disk and subsequent IO contention
>> that that cap was introduced.
>> As a possible bonus with this change, you can make other optimizations like
>> batching row reads from disk where the keys were in key cache (does this
>> even make sense? I'm not too sure how that would work).
>
> I think this is a reasonable analysis. The idea of stages in the
> research SEDA is to put bounds around scarce resources. I wouldn't
> call reading from the row cache a scarce resource. I'd expect this
> change to have significant performance improvements for workloads that
> are heavily rowcache-able.
>
> -ryan
>

I think that makes sense. If I understand correctly the only type of
reads that will be served purely from Row Cache would be CL.ONE, so
reads of QUORUM or ALL would skip this stage.

Re: Soliciting thoughts on possible read optimization

Posted by Ryan King <ry...@twitter.com>.
On Tue, Aug 10, 2010 at 8:43 PM, Arya Asemanfar <ar...@gmail.com> wrote:
> I mentioned this today to a couple folks at Cassandra Summit, and thought
> I'd solicit some more thoughts here.
> Currently, the read stage includes checking row cache. So if your concurrent
> reads is N and you have N reads reading from disk, the next read will block
> until a disk read finishes, even if it's in row cache. Would it make sense
> to isolate disk reads from cache reads? To either make the read stage be
> only used on misses, or to make 2 read stages CacheRead and DiskRead? Of
> course, we'd have to go to DiskRead for mmap since we wouldn't know until we
> asked the OS.
> My thought is that stages should be based on resources rather than
> semantics, but that may be wrong. Logically, I don't think it would make
> sense to have the read stage bounded in a hypothetical system where there is
> no IO; it's most likely because of the disk and subsequent IO contention
> that that cap was introduced.
> As a possible bonus with this change, you can make other optimizations like
> batching row reads from disk where the keys were in key cache (does this
> even make sense? I'm not too sure how that would work).

I think this is a reasonable analysis. The idea of stages in the
research SEDA is to put bounds around scarce resources. I wouldn't
call reading from the row cache a scarce resource. I'd expect this
change to have significant performance improvements for workloads that
are heavily rowcache-able.

-ryan