You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by zangds <za...@ihep.ac.cn> on 2010/11/10 14:35:40 UTC

about key sorting and token partitioning

Hi,
I am using cassandra to store a message steam, and want to use timestamps (like yyyymmddhhMIss or something alike) as the keys.
So if I use RandomPartitioner, I will loose the order when using get_range_slices().
If I use OrderPreservingPartitioner, how should I configure cassandra to make load balance between the nodes?

Thanks!

2010-11-10 



zangds

Re: about key sorting and token partitioning

Posted by Peter Schuller <pe...@infidyne.com>.

> I am using cassandra to store a message steam, and want to use timestamps
> (like yyyymmddhhMIss or something alike) as the keys.
> So if I use RandomPartitioner, I will loose the order when using
> get_range_slices().
> If I use OrderPreservingPartitioner, how should I configure cassandra to
> make load balance between the nodes?

AFAIK there's no silver bullet to making the order preserving
partitioner easy to use w.r.t. node balancing in the situation you're
describing.

One thing to consider is to use the random partitioner (for its
simplicity in managing the cluster) and use a granular subset of the
timestamp as the row key. For example, you could have the row key be
yyyymmddhh to get an entire hour per row.

A reasonable granularity would depend on your use-case; but the idea
is to be able to take advantage of the simplicity of using the random
partitioner, while having reasonable efficiency on range slices by
making each row contain a pretty large range such that any additional
overhead in jumping across nodes is negligible in comparison to the
other work done.

-- 
/ Peter Schuller