You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by "Hien. To Trong" <hi...@vng.com.vn> on 2010/08/22 19:46:29 UTC

Order preserving partitioning strategy

Hi,
I am developing a system with some features like cassandra.
I want to add order preserving partitioning strategy, but I don't know how to implement it.

In cassandra paper - Cassandra - A Decentralized Structured Storage System
"Cassandra partitions data across the cluster using consistent hashing but uses an order pre-
serving hash function (OPHF) to do so"

I wonder:

1. Cassandra still use a hash function (the other strategy is random partitioner) for OPP? 
If so, what is the algorithm of OPHF? is it a type of minimal perfect hash function (MPHF)?

I already read some papers about algorithms for MPHF which preserve the order of hash value. However, 
the size of key space equals and hash value space are equal and much more smaller than the size of key space 
(may be userid or usertaskid) in our application. How can I deal with that or I went on the wrong track?

2. My system is simple. I have some servers and I use Berkeley DB to store Key/Value (our data model is simple). Is OPP strategy useful 
when I don't have data model like cassandra? (column family for example).

Thanks so much.

Re: Order preserving partitioning strategy

Posted by Benjamin Black <b...@b3k.us>.

https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java

On Sun, Aug 22, 2010 at 10:46 AM, Hien. To Trong <hi...@vng.com.vn> wrote:
> Hi,
> I am developing a system with some features like cassandra.
> I want to add order preserving partitioning strategy, but I don't know how to implement it.
>
> In cassandra paper - Cassandra - A Decentralized Structured Storage System
> "Cassandra partitions data across the cluster using consistent hashing but uses an order pre-
> serving hash function (OPHF) to do so"
>
> I wonder:
>
> 1. Cassandra still use a hash function (the other strategy is random partitioner) for OPP?
> If so, what is the algorithm of OPHF? is it a type of minimal perfect hash function (MPHF)?
>
> I already read some papers about algorithms for MPHF which preserve the order of hash value. However,
> the size of key space equals and hash value space are equal and much more smaller than the size of key space
> (may be userid or usertaskid) in our application. How can I deal with that or I went on the wrong track?
>
> 2. My system is simple. I have some servers and I use Berkeley DB to store Key/Value (our data model is simple). Is OPP strategy useful
> when I don't have data model like cassandra? (column family for example).
>
> Thanks so much.