You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kamal Bahadur <ma...@gmail.com> on 2012/01/18 18:53:22 UTC

Max records per node for a given secondary index value

Hi All,

It is great to know that Cassandra column family can accommodate 2 billion
columns per row! I was reading about how Cassandra stores the secondary
index info internally. I now understand that the index related data are
stored in hidden CF and each node is responsible to store the keys of data
that reside on that node only.

I have been using secondary index for a low cardinality column called
"product". There can only be 3 possible values for this column. I have a
four node cluster and process about 5000 records per second with a RF 2.

My question here is, what happens after the number of columns in hidden
index CF exceeds 2 billion? How does Cassandra handle this situation? I
guess, one way to handle this is to add more nodes to the cluster. I am
interested in knowing if any other solution exist.

Thanks,
Kamal

Re: Max records per node for a given secondary index value

Posted by aaron morton <aa...@thelastpickle.com>.
Each node is stores  the rows in it's token range, and those in the token ranges it is a replica for. So it will store roughly num_nodes / rf   the rows.

If you are approaching a situation where the node may store 2 billion rows, and so may have 2 billion entries in the secondary index row, you would need to add more nodes to reduce the number of rows the node stores. 

IMHO it sounds like there are some efficiencies to be found in your data model. If you have write once records it may be more efficient to create a CF to support your common queries. Also the utility of 2 billion things in an index is probably questionable, it may be useful to partition by date. 

Hope that helps.
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/01/2012, at 3:11 PM, Mohit Anchlia wrote:

> You need to shard your rows
> 
> On Wed, Jan 18, 2012 at 5:46 PM, Kamal Bahadur <ma...@gmail.com> wrote:
>> Anyone?
>> 
>> 
>> On Wed, Jan 18, 2012 at 9:53 AM, Kamal Bahadur <ma...@gmail.com>
>> wrote:
>>> 
>>> Hi All,
>>> 
>>> It is great to know that Cassandra column family can accommodate 2 billion
>>> columns per row! I was reading about how Cassandra stores the secondary
>>> index info internally. I now understand that the index related data are
>>> stored in hidden CF and each node is responsible to store the keys of data
>>> that reside on that node only.
>>> 
>>> I have been using secondary index for a low cardinality column called
>>> "product". There can only be 3 possible values for this column. I have a
>>> four node cluster and process about 5000 records per second with a RF 2.
>>> 
>>> My question here is, what happens after the number of columns in hidden
>>> index CF exceeds 2 billion? How does Cassandra handle this situation? I
>>> guess, one way to handle this is to add more nodes to the cluster. I am
>>> interested in knowing if any other solution exist.
>>> 
>>> Thanks,
>>> Kamal
>> 
>> 


Re: Max records per node for a given secondary index value

Posted by Mohit Anchlia <mo...@gmail.com>.
You need to shard your rows

On Wed, Jan 18, 2012 at 5:46 PM, Kamal Bahadur <ma...@gmail.com> wrote:
> Anyone?
>
>
> On Wed, Jan 18, 2012 at 9:53 AM, Kamal Bahadur <ma...@gmail.com>
> wrote:
>>
>> Hi All,
>>
>> It is great to know that Cassandra column family can accommodate 2 billion
>> columns per row! I was reading about how Cassandra stores the secondary
>> index info internally. I now understand that the index related data are
>> stored in hidden CF and each node is responsible to store the keys of data
>> that reside on that node only.
>>
>> I have been using secondary index for a low cardinality column called
>> "product". There can only be 3 possible values for this column. I have a
>> four node cluster and process about 5000 records per second with a RF 2.
>>
>> My question here is, what happens after the number of columns in hidden
>> index CF exceeds 2 billion? How does Cassandra handle this situation? I
>> guess, one way to handle this is to add more nodes to the cluster. I am
>> interested in knowing if any other solution exist.
>>
>> Thanks,
>> Kamal
>
>

Re: Max records per node for a given secondary index value

Posted by Kamal Bahadur <ma...@gmail.com>.
Anyone?

On Wed, Jan 18, 2012 at 9:53 AM, Kamal Bahadur <ma...@gmail.com>wrote:

> Hi All,
>
> It is great to know that Cassandra column family can accommodate 2 billion
> columns per row! I was reading about how Cassandra stores the secondary
> index info internally. I now understand that the index related data are
> stored in hidden CF and each node is responsible to store the keys of data
> that reside on that node only.
>
> I have been using secondary index for a low cardinality column called
> "product". There can only be 3 possible values for this column. I have a
> four node cluster and process about 5000 records per second with a RF 2.
>
> My question here is, what happens after the number of columns in hidden
> index CF exceeds 2 billion? How does Cassandra handle this situation? I
> guess, one way to handle this is to add more nodes to the cluster. I am
> interested in knowing if any other solution exist.
>
> Thanks,
> Kamal
>