You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by AJ <aj...@dude.podzone.net> on 2011/06/13 23:06:16 UTC

Is this the proper use of OPP?

I'm just becoming aware of the restrictions of using an OPP as compared 
to Random.  Please let me know if I understand this correctly.

First off, if using the OPP only for an increased performance of range 
queries, then it will probably be very hard to predict if you will end 
up with hotspots or not and thus where and even how the data may be 
clustered together in a particular node.  This is because all the 
various keys of the various CFs may or may not have any correlation with 
one another.  So, in effect, you just have a big mess of keys of various 
ranges and formats, but they all are partitioned according to one global 
set of tokens that apply to ALL CFs of ALL keyspaces.

[main reason for post below...]
OTOH, if you want to use OPP to purposely cluster certain data together 
on specific nodes, such as for geographic partitioning, then you have to 
choose a prefix for all of the keys of ALL CFs and ALL keyspaces!  This 
is because they will all be partitioned based on the tokens assigned to 
the nodes.  IOW, if I had two datacenters, one in the US and another in 
Europe, then for all rows in all KSs and in all CFs, I would need to 
prepend a prefix to the keys, such as "US:" and "EU:".  The problem is I 
may not want ALL of my CFs to be partitioned this way; only specific 
ones.  Also, it may be very difficult if not impossible for all keys of 
all keyspaces and CFs to use keys of this form.  I'm not sure if Cass is 
designed for this.

However, if using the random partitioner, then there is no problem.  You 
can use any key of any type you want (UTF8, Long, etc.) since they are 
all hashed before deciding which node gets the key/row.

Do I understand things correctly or am I missing something?  Is Cass 
designed to use OPP this way or am I hacking it?  If so, is there an 
acceptable way to do geographic partitioning?

Also, what is OPP really good for?

Thanks!

Re: Is this the proper use of OPP?

Posted by AJ <aj...@dude.podzone.net>.
Thanks.  I found that article later.  I was definitely off-base with 
respect to OPP.  Random partitioning is pretty much the way to go and 
datastax has a good article on geographic distribution: 
http://www.datastax.com/docs/0.8/operations/datacenter

Sorry for the long pointless post previously.  But, FWIW, I don't see 
much use for OPP other than the corner case of a cluster consisting on 1 
ks and 1 cf, such as an index.  I will have to read Dominic's post on 
having multiple Cass clusters running on the same nodes.

On 6/14/2011 4:46 AM, Eric tamme wrote:
> I would point you to this article, it does a good job describing OPP
> and pretty much answers the specific questions you asked.
>
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>
> -Eric
>
>
> On Mon, Jun 13, 2011 at 5:06 PM, AJ<aj...@dude.podzone.net>  wrote:
>> I'm just becoming aware of the restrictions of using an OPP as compared to
>> Random.  Please let me know if I understand this correctly.
>>
>> First off, if using the OPP only for an increased performance of range
>> queries, then it will probably be very hard to predict if you will end up
>> with hotspots or not and thus where and even how the data may be clustered
>> together in a particular node.  This is because all the various keys of the
>> various CFs may or may not have any correlation with one another.  So, in
>> effect, you just have a big mess of keys of various ranges and formats, but
>> they all are partitioned according to one global set of tokens that apply to
>> ALL CFs of ALL keyspaces.
>>
>> [main reason for post below...]
>> OTOH, if you want to use OPP to purposely cluster certain data together on
>> specific nodes, such as for geographic partitioning, then you have to choose
>> a prefix for all of the keys of ALL CFs and ALL keyspaces!  This is because
>> they will all be partitioned based on the tokens assigned to the nodes.
>>   IOW, if I had two datacenters, one in the US and another in Europe, then
>> for all rows in all KSs and in all CFs, I would need to prepend a prefix to
>> the keys, such as "US:" and "EU:".  The problem is I may not want ALL of my
>> CFs to be partitioned this way; only specific ones.  Also, it may be very
>> difficult if not impossible for all keys of all keyspaces and CFs to use
>> keys of this form.  I'm not sure if Cass is designed for this.
>>
>> However, if using the random partitioner, then there is no problem.  You can
>> use any key of any type you want (UTF8, Long, etc.) since they are all
>> hashed before deciding which node gets the key/row.
>>
>> Do I understand things correctly or am I missing something?  Is Cass
>> designed to use OPP this way or am I hacking it?  If so, is there an
>> acceptable way to do geographic partitioning?
>>
>> Also, what is OPP really good for?
>>
>> Thanks!
>>


Re: Is this the proper use of OPP?

Posted by Eric tamme <et...@gmail.com>.
I would point you to this article, it does a good job describing OPP
and pretty much answers the specific questions you asked.

http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

-Eric


On Mon, Jun 13, 2011 at 5:06 PM, AJ <aj...@dude.podzone.net> wrote:
> I'm just becoming aware of the restrictions of using an OPP as compared to
> Random.  Please let me know if I understand this correctly.
>
> First off, if using the OPP only for an increased performance of range
> queries, then it will probably be very hard to predict if you will end up
> with hotspots or not and thus where and even how the data may be clustered
> together in a particular node.  This is because all the various keys of the
> various CFs may or may not have any correlation with one another.  So, in
> effect, you just have a big mess of keys of various ranges and formats, but
> they all are partitioned according to one global set of tokens that apply to
> ALL CFs of ALL keyspaces.
>
> [main reason for post below...]
> OTOH, if you want to use OPP to purposely cluster certain data together on
> specific nodes, such as for geographic partitioning, then you have to choose
> a prefix for all of the keys of ALL CFs and ALL keyspaces!  This is because
> they will all be partitioned based on the tokens assigned to the nodes.
>  IOW, if I had two datacenters, one in the US and another in Europe, then
> for all rows in all KSs and in all CFs, I would need to prepend a prefix to
> the keys, such as "US:" and "EU:".  The problem is I may not want ALL of my
> CFs to be partitioned this way; only specific ones.  Also, it may be very
> difficult if not impossible for all keys of all keyspaces and CFs to use
> keys of this form.  I'm not sure if Cass is designed for this.
>
> However, if using the random partitioner, then there is no problem.  You can
> use any key of any type you want (UTF8, Long, etc.) since they are all
> hashed before deciding which node gets the key/row.
>
> Do I understand things correctly or am I missing something?  Is Cass
> designed to use OPP this way or am I hacking it?  If so, is there an
> acceptable way to do geographic partitioning?
>
> Also, what is OPP really good for?
>
> Thanks!
>