You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by David Ward <da...@shareablee.com> on 2013/08/05 06:57:43 UTC

Better to have lower or greater cardinality for partition key in CQL3?

Hello,
     Was curious what people had found to be better for
structuring/modeling data into C*?   With my data I have two primary
keys, one 64 bit int thats 0 - 50 million ( its unlikely to go higher
then 70 million ever ) and another 64 bit that's probably close to
hitting a trillion in the next year or so.   Looking at how the data
is going to behave, for the first few months each row/record will be
updated but after that its practically written in stone.  Still I was
leaning toward leveled compaction as it gets updated anywhere from
once an hour to at least once a day for the first 7 days.

So from anyones experience, is it better to use a low cardinality
partition key or a high cardinality.   Additionally data organized by
the low cardinality set is probably 1-6B ( and growing ) but the high
cardinality would be 1-6MB only 2-3x a year.


Thanks,
   Dave


new high cardinality keys in 1 year ~15,768,00,000
new low cardinality keys in 1 year = 10,000-30,000

low cardinality key set size ~1-6GB
high cardinality key set size 1-5MB

Re: Better to have lower or greater cardinality for partition key in CQL3?

Posted by Aaron Morton <aa...@thelastpickle.com>.
> So from anyones experience, is it better to use a low cardinality
> partition key or a high cardinality.
IMHO go with whatever best supports the read paths. They all get 
If you have lots (e.g. north of 1 billion) rows per node there are extra considerations that come into play. Cassandra 1.2 helps a lot with the bloom filters and compression meta off heap. Basically you may need to pay more attention to memory usage at that scale. 

This is one place for the LCS can help. It allows you to have a higher bloom filter FP chance, which results in a lower memory overhead for a given number of rows. Remember that LCS uses roughly twice the IO though, so make sure you can handle the throughput. 

Otherwise your update workflow sounds is a perfect match for Size Tiered compaction.   

Hope that helps.

-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/08/2013, at 4:57 PM, David Ward <da...@shareablee.com> wrote:

> Hello,
>     Was curious what people had found to be better for
> structuring/modeling data into C*?   With my data I have two primary
> keys, one 64 bit int thats 0 - 50 million ( its unlikely to go higher
> then 70 million ever ) and another 64 bit that's probably close to
> hitting a trillion in the next year or so.   Looking at how the data
> is going to behave, for the first few months each row/record will be
> updated but after that its practically written in stone.  Still I was
> leaning toward leveled compaction as it gets updated anywhere from
> once an hour to at least once a day for the first 7 days.
> 
> So from anyones experience, is it better to use a low cardinality
> partition key or a high cardinality.   Additionally data organized by
> the low cardinality set is probably 1-6B ( and growing ) but the high
> cardinality would be 1-6MB only 2-3x a year.
> 
> 
> Thanks,
>   Dave
> 
> 
> new high cardinality keys in 1 year ~15,768,00,000
> new low cardinality keys in 1 year = 10,000-30,000
> 
> low cardinality key set size ~1-6GB
> high cardinality key set size 1-5MB