You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Eax Melanhovich <ma...@eax.me> on 2015/05/26 21:00:45 UTC

A few stupid questions...

Hello.

I'm reading "Beginning Apache Cassandra Development" and there are a few
things I can't figure out.

First. Lets say I have a table (field1, field2, field3, field4), where
(field1, field2) is a primary key and field1 is partition key. There is
a secondary index for field3 column. Do I right understand that in this
case query like:

select ... from my_table where field1 = 123 and field3 > '...'; 

... would be quite efficient, i.e. request would be send only to one
node, not the whole cluster?

Second. Lets say there is some data that almost never changes but is
read all the time. E.g. information about smiles in social network. Or
current sessions. In this case would Cassandra cache "hot" data in
memtable? Or such data should be stored somewhere else, i.e. Redis or
Couchbase?

-- 
Best regards,
Eax Melanhovich
http://eax.me/

Re: A few stupid questions...

Posted by Eax Melanhovich <ma...@eax.me>.
Thank you!

On Tue, 26 May 2015 15:45:01 -0500
Tyler Hobbs <ty...@datastax.com> wrote:

> On Tue, May 26, 2015 at 2:00 PM, Eax Melanhovich <ma...@eax.me> wrote:
> 
> >
> > First. Lets say I have a table (field1, field2, field3, field4),
> > where (field1, field2) is a primary key and field1 is partition
> > key. There is a secondary index for field3 column. Do I right
> > understand that in this case query like:
> >
> > select ... from my_table where field1 = 123 and field3 > '...';
> >
> > ... would be quite efficient, i.e. request would be send only to one
> > node, not the whole cluster?
> >
> 
> You are correct that it would only query one node (or one set of
> replicas, if RF > 1 and CL > 1) due to the partition key being
> restricted.  However, using '>' for the operator on the indexed
> column forces Cassandra to scan the partition instead of using the
> index, because secondary indexes only support '=' operations.  If you
> care about performance, you're probably better off creating a
> dedicated table to serve this type of query, as described here:
> http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
> 
> 
> >
> > Second. Lets say there is some data that almost never changes but is
> > read all the time. E.g. information about smiles in social network.
> > Or current sessions. In this case would Cassandra cache "hot" data
> > in memtable? Or such data should be stored somewhere else, i.e.
> > Redis or Couchbase?
> 
> 
> Memtables are only used for buffering writes, not for caching read
> data. Cassandra does have several layers of caching though.
> Frequently read data will end up in the key cache and the OS page
> cache, making reads quite efficient.  Optionally, you can also enable
> the row cache.  Since you're almost never modifying the data, the row
> cache is actually a decent fit, although I recommend testing it
> heavily with your use case for stability. The best way to find out if
> your performance is good enough is to benchmark it with your own
> usecase.
> 
> 



-- 
Best regards,
Eax Melanhovich
http://eax.me/

Re: A few stupid questions...

Posted by Tyler Hobbs <ty...@datastax.com>.
On Tue, May 26, 2015 at 2:00 PM, Eax Melanhovich <ma...@eax.me> wrote:

>
> First. Lets say I have a table (field1, field2, field3, field4), where
> (field1, field2) is a primary key and field1 is partition key. There is
> a secondary index for field3 column. Do I right understand that in this
> case query like:
>
> select ... from my_table where field1 = 123 and field3 > '...';
>
> ... would be quite efficient, i.e. request would be send only to one
> node, not the whole cluster?
>

You are correct that it would only query one node (or one set of replicas,
if RF > 1 and CL > 1) due to the partition key being restricted.  However,
using '>' for the operator on the indexed column forces Cassandra to scan
the partition instead of using the index, because secondary indexes only
support '=' operations.  If you care about performance, you're probably
better off creating a dedicated table to serve this type of query, as
described here:
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling


>
> Second. Lets say there is some data that almost never changes but is
> read all the time. E.g. information about smiles in social network. Or
> current sessions. In this case would Cassandra cache "hot" data in
> memtable? Or such data should be stored somewhere else, i.e. Redis or
> Couchbase?


Memtables are only used for buffering writes, not for caching read data.
Cassandra does have several layers of caching though.  Frequently read data
will end up in the key cache and the OS page cache, making reads quite
efficient.  Optionally, you can also enable the row cache.  Since you're
almost never modifying the data, the row cache is actually a decent fit,
although I recommend testing it heavily with your use case for stability.
The best way to find out if your performance is good enough is to benchmark
it with your own usecase.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>