You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by David Boxenhorn <da...@lookin2.com> on 2010/07/07 11:47:02 UTC

OPP + Hash on client side

Is there any strategy for using OPP with a hash algorithm on the client side
to get both uniform distribution of data in the cluster *and* the ability to
do range queries?

I'm thinking of something like this:

cassKey = (key % 97) + "@" + key;

cassRange = 0 + "@" + range; 1 + "@" + range; ... 96 + "@" + range;

Would something like that work?

Re: OPP + Hash on client side

Posted by David Boxenhorn <da...@lookin2.com>.
Aaron, thank you for the link.

What is discussed there is not exactly what I am thinking of. They propose
distributing the keys with <MD5(ROWKEY)>.<ROWKEY> - which will distribute
the values in a way that cannot easily be reversed. What I am proposing is
to distribute the keys evenly among N buckets, where N is much larger than
your number of nodes, and then construct my range queries as the union of N
range queries that I actually perform on Cassandra.

"You can do range queries with the Random Partitioner in 0.6.*"

I went though this before, it's not true. What you can do is loop over your
entire set of keys in random order. There is no way to get an actual range
other than the whole range.


On Wed, Jul 7, 2010 at 1:15 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> That pattern is discussed here
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>
> It's also used in http://github.com/tjake/Lucandra
>
> You can do range queries with the Random Partitioner in 0.6.*, the order of
> the return is undefined and it's a bit slower.
>
> I think it's normally used when you want ordered range queries in some CF's
> and random distribution in others.
>
> Aaron
>
>
> On 07 Jul, 2010,at 09:47 PM, David Boxenhorn <da...@lookin2.com> wrote:
>
> Is there any strategy for using OPP with a hash algorithm on the client
> side to get both uniform distribution of data in the cluster *and* the
> ability to do range queries?
>
> I'm thinking of something like this:
>
> cassKey = (key % 97) + "@" + key;
>
> cassRange = 0 + "@" + range; 1 + "@" + range; ... 96 + "@" + range;
>
> Would something like that work?
>
>

Re: OPP + Hash on client side

Posted by Aaron Morton <aa...@thelastpickle.com>.
That pattern is discussed here http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

It's also used in http://github.com/tjake/Lucandra

You can do range queries with the Random Partitioner in 0.6.*, the order of the return is undefined and it's a bit slower. 

I think it's normally used when you want ordered range queries in some CF's and random distribution in others. 

Aaron

On 07 Jul, 2010,at 09:47 PM, David Boxenhorn <da...@lookin2.com> wrote:

Is there any strategy for using OPP with a hash algorithm on the client side to get both uniform distribution of data in the cluster *and* the ability to do range queries? 

I'm thinking of something like this:

cassKey = (key % 97) + "@" + key;

cassRange = 0 + "@" + range; 1 + "@" + range; ... 96 + "@" + range;

Would something like that work?