You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ramesh Natarajan <ra...@gmail.com> on 2011/10/06 20:50:17 UTC

read on multiple SS tables

Lets assume I perform frequent insert & update on a column family..
Over a period of time multiple sstables will have this row/column
data.
I have 2 questions about how reads work in cassandra w.r.t. multiple SS tables.

-If you perform a query for a specific row key and a column name, does
it read the most recent SSTable first and if it finds a hit, does it
stop there or does it need to read through all the SStables (to find
most recent one) regardless of whether if found a hit on the most
recent SSTable or not?

-  If I perform a slice query on a column range does cassandra iterate
all the SS tables?

We have an option to  create

1st option:

Key1 |  COL1 | COL2 | COL3 .....  <multiple columns >

We need to perform a slice query to get  COL1-COL3 using key1.

2nd option:

Key1 |  <COL as one column and have application place values of
COL1-COLN in this one column>

This key would be updated several times where the app would manage
adding multiple values to the one column key. Our max col value size
will be less than 64mb. When you need to search for a value, we would
read the one column and the application would manage looking up the
appropriate value in the list of values.

So I am wondering which option would be most efficient from read point of view.

thanks
Ramesh

Re: read on multiple SS tables

Posted by Brandon Williams <dr...@gmail.com>.

On Thu, Oct 6, 2011 at 3:56 PM, aaron morton <aa...@thelastpickle.com> wrote:
> -If you perform a query for a specific row key and a column name, does
> it read the most recent SSTable first and if it finds a hit, does it
> stop there or does it need to read through all the SStables (to find
> most recent one) regardless of whether if found a hit on the most
> recent SSTable or not?
>
> Reads all SSTables, as the only way to know which column instance has the
> highest time stamp is to read them all.

Until https://issues.apache.org/jira/browse/CASSANDRA-2498 which makes
this much faster.

-Brandon

Re: read on multiple SS tables

Posted by aaron morton <aa...@thelastpickle.com>.

> -If you perform a query for a specific row key and a column name, does
> it read the most recent SSTable first and if it finds a hit, does it
> stop there or does it need to read through all the SStables (to find
> most recent one) regardless of whether if found a hit on the most
> recent SSTable or not?
Reads all SSTables, as the only way to know which column instance has the highest time stamp is to read them all. 

> -  If I perform a slice query on a column range does cassandra iterate
> all the SS tables?
All SSTables that contain any data for the row. 

(background http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/)

> So I am wondering which option would be most efficient from read point of view.
I would go with the first, 64MB columns will be a pain. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7/10/2011, at 7:50 AM, Ramesh Natarajan wrote:

> Lets assume I perform frequent insert & update on a column family..
> Over a period of time multiple sstables will have this row/column
> data.
> I have 2 questions about how reads work in cassandra w.r.t. multiple SS tables.
> 
> -If you perform a query for a specific row key and a column name, does
> it read the most recent SSTable first and if it finds a hit, does it
> stop there or does it need to read through all the SStables (to find
> most recent one) regardless of whether if found a hit on the most
> recent SSTable or not?
> 
> -  If I perform a slice query on a column range does cassandra iterate
> all the SS tables?
> 
> We have an option to  create
> 
> 1st option:
> 
> Key1 |  COL1 | COL2 | COL3 .....  <multiple columns >
> 
> We need to perform a slice query to get  COL1-COL3 using key1.
> 
> 2nd option:
> 
> Key1 |  <COL as one column and have application place values of
> COL1-COLN in this one column>
> 
> This key would be updated several times where the app would manage
> adding multiple values to the one column key. Our max col value size
> will be less than 64mb. When you need to search for a value, we would
> read the one column and the application would manage looking up the
> appropriate value in the list of values.
> 
> So I am wondering which option would be most efficient from read point of view.
> 
> thanks
> Ramesh