You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Takenori Sato <ts...@cloudian.com> on 2013/08/22 10:19:39 UTC

Random Distribution, yet Order Preserving Partitioner

Hi,

I am trying to implement a custom partitioner that evenly distributes, yet
preserves order.

The partitioner returns a token by BigInteger as RandomPartitioner does,
while does a decorated key by string as OrderPreservingPartitioner does.
* for now, since IPartitioner<T> does not support different types for token
and key, BigInteger is simply converted to string

Then, I played around with cassandra-cli. As expected, in my 3 nodes test
cluster, get/set worked, but list(get_range_slices) didn't.

This came from a challenge to overcome a wide row scalability. So, I want
to make it work!

I am aware that some efforts are required to make get_range_slices work.
But are there any other critical problems? For example, it seems there is
an assumption that token and key are the same. If this is throughout the
whole C* code, this partitioner is not practical.

Or have your tried something similar?

I would appreciate your feedback!

Thanks,
Takenori

Re: Random Distribution, yet Order Preserving Partitioner

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Manoj,

Thanks for your advise.

More or less, basically we do the same. As you pointed out, we now face
with many cases that can not be solved by data modeling, and which are
reaching to 100 millions of columns.

We can split them down to multiple pieces of metadata rows, but that will
bring more complexity, thus error prone. If possible, want to avoid that.

- Takenori

2013/08/27 21:37$B!"(BManoj Mainali <ma...@gmail.com> $B$N%a%C%;!<%8(B:

Hi Takenori,

I can't tell for sure without knowing what kind of data you have and how
much you have.You can use the random partitioner and use the concept of
metadata row that stores the row key, as for example like below

{metadata_row}: key1 | key2 | key3
key1:column1 | column2

 When you do the read you can always directly query by the key, if you
already know it. In the case of range queries, first you query the
metadata_row and get the keys you want in the ordered fashion. Then you can
do multi_get to get you actual data.

The downside is you have to do two read queries, and depending on how much
data you have you will end up with a wide metadata row.

Manoj


On Fri, Aug 23, 2013 at 8:47 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi Nick,
>
> > token and key are not same. it was like this long time ago (single MD5
> assumed single key)
>
> True. That reminds me of making a test with the latest 1.2 instead of our
> current 1.0!
>
> > if you want ordered, you probably can arrange your data in a way so you
> can get it in ordered fashion.
>
> Yeah, we have done for a long time. That's called a wide row, right? Or a
> compound primary key.
>
> It can handle some millions of columns, but not more like 10M. I mean, a
> request for such a row concentrates on a particular node, so the
> performance degrades.
>
> > I also had idea for semi-ordered partitioner - instead of single MD5,
> to have two MD5's.
>
> Sounds interesting. But, we need a fully ordered result.
>
> Anyway, I will try with the latest version.
>
> Thanks,
> Takenori
>
>
> On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov <nm...@nmmm.nu> wrote:
>
>> my five cents -
>> token and key are not same. it was like this long time ago (single MD5
>> assumed single key)
>>
>> if you want ordered, you probably can arrange your data in a way so you
>> can get it in ordered fashion.
>> for example long ago, i had single column family with single key and
>> about 2-3 M columns - I do not suggest you to do it this way, because is
>> wrong way, but it is easy to understand the idea.
>>
>> I also had idea for semi-ordered partitioner - instead of single MD5, to
>> have two MD5's.
>> then you can get semi-ordered ranges, e.g. you get ordered all cities in
>> Canada, all cities in US and so on.
>> however in this way things may get pretty non-ballanced
>>
>> Nick
>>
>>
>>
>>
>>
>> On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato <ts...@cloudian.com>wrote:
>>
>>> Hi,
>>>
>>> I am trying to implement a custom partitioner that evenly distributes,
>>> yet preserves order.
>>>
>>> The partitioner returns a token by BigInteger as RandomPartitioner does,
>>> while does a decorated key by string as OrderPreservingPartitioner does.
>>> * for now, since IPartitioner<T> does not support different types for
>>> token and key, BigInteger is simply converted to string
>>>
>>> Then, I played around with cassandra-cli. As expected, in my 3 nodes
>>> test cluster, get/set worked, but list(get_range_slices) didn't.
>>>
>>> This came from a challenge to overcome a wide row scalability. So, I
>>> want to make it work!
>>>
>>> I am aware that some efforts are required to make get_range_slices work.
>>> But are there any other critical problems? For example, it seems there is
>>> an assumption that token and key are the same. If this is throughout the
>>> whole C* code, this partitioner is not practical.
>>>
>>> Or have your tried something similar?
>>>
>>> I would appreciate your feedback!
>>>
>>> Thanks,
>>> Takenori
>>>
>>
>>
>

Re: Random Distribution, yet Order Preserving Partitioner

Posted by Manoj Mainali <ma...@gmail.com>.

Hi Takenori,

I can't tell for sure without knowing what kind of data you have and how
much you have.You can use the random partitioner and use the concept of
metadata row that stores the row key, as for example like below

{metadata_row}: key1 | key2 | key3
key1:column1 | column2

 When you do the read you can always directly query by the key, if you
already know it. In the case of range queries, first you query the
metadata_row and get the keys you want in the ordered fashion. Then you can
do multi_get to get you actual data.

The downside is you have to do two read queries, and depending on how much
data you have you will end up with a wide metadata row.

Manoj


On Fri, Aug 23, 2013 at 8:47 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi Nick,
>
> > token and key are not same. it was like this long time ago (single MD5
> assumed single key)
>
> True. That reminds me of making a test with the latest 1.2 instead of our
> current 1.0!
>
> > if you want ordered, you probably can arrange your data in a way so you
> can get it in ordered fashion.
>
> Yeah, we have done for a long time. That's called a wide row, right? Or a
> compound primary key.
>
> It can handle some millions of columns, but not more like 10M. I mean, a
> request for such a row concentrates on a particular node, so the
> performance degrades.
>
> > I also had idea for semi-ordered partitioner - instead of single MD5,
> to have two MD5's.
>
> Sounds interesting. But, we need a fully ordered result.
>
> Anyway, I will try with the latest version.
>
> Thanks,
> Takenori
>
>
> On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov <nm...@nmmm.nu> wrote:
>
>> my five cents -
>> token and key are not same. it was like this long time ago (single MD5
>> assumed single key)
>>
>> if you want ordered, you probably can arrange your data in a way so you
>> can get it in ordered fashion.
>> for example long ago, i had single column family with single key and
>> about 2-3 M columns - I do not suggest you to do it this way, because is
>> wrong way, but it is easy to understand the idea.
>>
>> I also had idea for semi-ordered partitioner - instead of single MD5, to
>> have two MD5's.
>> then you can get semi-ordered ranges, e.g. you get ordered all cities in
>> Canada, all cities in US and so on.
>> however in this way things may get pretty non-ballanced
>>
>> Nick
>>
>>
>>
>>
>>
>> On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato <ts...@cloudian.com>wrote:
>>
>>> Hi,
>>>
>>> I am trying to implement a custom partitioner that evenly distributes,
>>> yet preserves order.
>>>
>>> The partitioner returns a token by BigInteger as RandomPartitioner does,
>>> while does a decorated key by string as OrderPreservingPartitioner does.
>>> * for now, since IPartitioner<T> does not support different types for
>>> token and key, BigInteger is simply converted to string
>>>
>>> Then, I played around with cassandra-cli. As expected, in my 3 nodes
>>> test cluster, get/set worked, but list(get_range_slices) didn't.
>>>
>>> This came from a challenge to overcome a wide row scalability. So, I
>>> want to make it work!
>>>
>>> I am aware that some efforts are required to make get_range_slices work.
>>> But are there any other critical problems? For example, it seems there is
>>> an assumption that token and key are the same. If this is throughout the
>>> whole C* code, this partitioner is not practical.
>>>
>>> Or have your tried something similar?
>>>
>>> I would appreciate your feedback!
>>>
>>> Thanks,
>>> Takenori
>>>
>>
>>
>

Re: Random Distribution, yet Order Preserving Partitioner

Posted by Nikolay Mihaylov <nm...@nmmm.nu>.

It can handle some millions of columns, but not more like 10M. I mean, a
request for such a row concentrates on a particular node, so the
performance degrades.

> I also had idea for semi-ordered partitioner - instead of single MD5, to
have two MD5's.

works for us with wide row with about 40-50 M, but with lots of problems.

my research with get_count() shows first minor problems at 14-15K columns
in a row and then it just get worse.




On Fri, Aug 23, 2013 at 2:47 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi Nick,
>
> > token and key are not same. it was like this long time ago (single MD5
> assumed single key)
>
> True. That reminds me of making a test with the latest 1.2 instead of our
> current 1.0!
>
> > if you want ordered, you probably can arrange your data in a way so you
> can get it in ordered fashion.
>
> Yeah, we have done for a long time. That's called a wide row, right? Or a
> compound primary key.
>
> It can handle some millions of columns, but not more like 10M. I mean, a
> request for such a row concentrates on a particular node, so the
> performance degrades.
>
> > I also had idea for semi-ordered partitioner - instead of single MD5,
> to have two MD5's.
>
> Sounds interesting. But, we need a fully ordered result.
>
> Anyway, I will try with the latest version.
>
> Thanks,
> Takenori
>
>
> On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov <nm...@nmmm.nu> wrote:
>
>> my five cents -
>> token and key are not same. it was like this long time ago (single MD5
>> assumed single key)
>>
>> if you want ordered, you probably can arrange your data in a way so you
>> can get it in ordered fashion.
>> for example long ago, i had single column family with single key and
>> about 2-3 M columns - I do not suggest you to do it this way, because is
>> wrong way, but it is easy to understand the idea.
>>
>> I also had idea for semi-ordered partitioner - instead of single MD5, to
>> have two MD5's.
>> then you can get semi-ordered ranges, e.g. you get ordered all cities in
>> Canada, all cities in US and so on.
>> however in this way things may get pretty non-ballanced
>>
>> Nick
>>
>>
>>
>>
>>
>> On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato <ts...@cloudian.com>wrote:
>>
>>> Hi,
>>>
>>> I am trying to implement a custom partitioner that evenly distributes,
>>> yet preserves order.
>>>
>>> The partitioner returns a token by BigInteger as RandomPartitioner does,
>>> while does a decorated key by string as OrderPreservingPartitioner does.
>>> * for now, since IPartitioner<T> does not support different types for
>>> token and key, BigInteger is simply converted to string
>>>
>>> Then, I played around with cassandra-cli. As expected, in my 3 nodes
>>> test cluster, get/set worked, but list(get_range_slices) didn't.
>>>
>>> This came from a challenge to overcome a wide row scalability. So, I
>>> want to make it work!
>>>
>>> I am aware that some efforts are required to make get_range_slices work.
>>> But are there any other critical problems? For example, it seems there is
>>> an assumption that token and key are the same. If this is throughout the
>>> whole C* code, this partitioner is not practical.
>>>
>>> Or have your tried something similar?
>>>
>>> I would appreciate your feedback!
>>>
>>> Thanks,
>>> Takenori
>>>
>>
>>
>

Re: Random Distribution, yet Order Preserving Partitioner

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Nick,

> token and key are not same. it was like this long time ago (single MD5
assumed single key)

True. That reminds me of making a test with the latest 1.2 instead of our
current 1.0!

> if you want ordered, you probably can arrange your data in a way so you
can get it in ordered fashion.

Yeah, we have done for a long time. That's called a wide row, right? Or a
compound primary key.

It can handle some millions of columns, but not more like 10M. I mean, a
request for such a row concentrates on a particular node, so the
performance degrades.

> I also had idea for semi-ordered partitioner - instead of single MD5, to
have two MD5's.

Sounds interesting. But, we need a fully ordered result.

Anyway, I will try with the latest version.

Thanks,
Takenori


On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov <nm...@nmmm.nu> wrote:

> my five cents -
> token and key are not same. it was like this long time ago (single MD5
> assumed single key)
>
> if you want ordered, you probably can arrange your data in a way so you
> can get it in ordered fashion.
> for example long ago, i had single column family with single key and about
> 2-3 M columns - I do not suggest you to do it this way, because is wrong
> way, but it is easy to understand the idea.
>
> I also had idea for semi-ordered partitioner - instead of single MD5, to
> have two MD5's.
> then you can get semi-ordered ranges, e.g. you get ordered all cities in
> Canada, all cities in US and so on.
> however in this way things may get pretty non-ballanced
>
> Nick
>
>
>
>
>
> On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato <ts...@cloudian.com>wrote:
>
>> Hi,
>>
>> I am trying to implement a custom partitioner that evenly distributes,
>> yet preserves order.
>>
>> The partitioner returns a token by BigInteger as RandomPartitioner does,
>> while does a decorated key by string as OrderPreservingPartitioner does.
>> * for now, since IPartitioner<T> does not support different types for
>> token and key, BigInteger is simply converted to string
>>
>> Then, I played around with cassandra-cli. As expected, in my 3 nodes test
>> cluster, get/set worked, but list(get_range_slices) didn't.
>>
>> This came from a challenge to overcome a wide row scalability. So, I want
>> to make it work!
>>
>> I am aware that some efforts are required to make get_range_slices work.
>> But are there any other critical problems? For example, it seems there is
>> an assumption that token and key are the same. If this is throughout the
>> whole C* code, this partitioner is not practical.
>>
>> Or have your tried something similar?
>>
>> I would appreciate your feedback!
>>
>> Thanks,
>> Takenori
>>
>
>

Re: Random Distribution, yet Order Preserving Partitioner

Posted by Nikolay Mihaylov <nm...@nmmm.nu>.

my five cents -
token and key are not same. it was like this long time ago (single MD5
assumed single key)

if you want ordered, you probably can arrange your data in a way so you can
get it in ordered fashion.
for example long ago, i had single column family with single key and about
2-3 M columns - I do not suggest you to do it this way, because is wrong
way, but it is easy to understand the idea.

I also had idea for semi-ordered partitioner - instead of single MD5, to
have two MD5's.
then you can get semi-ordered ranges, e.g. you get ordered all cities in
Canada, all cities in US and so on.
however in this way things may get pretty non-ballanced

Nick

On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi,
>
> I am trying to implement a custom partitioner that evenly distributes, yet
> preserves order.
>
> The partitioner returns a token by BigInteger as RandomPartitioner does,
> while does a decorated key by string as OrderPreservingPartitioner does.
> * for now, since IPartitioner<T> does not support different types for
> token and key, BigInteger is simply converted to string
>
> Then, I played around with cassandra-cli. As expected, in my 3 nodes test
> cluster, get/set worked, but list(get_range_slices) didn't.
>
> This came from a challenge to overcome a wide row scalability. So, I want
> to make it work!
>
> I am aware that some efforts are required to make get_range_slices work.
> But are there any other critical problems? For example, it seems there is
> an assumption that token and key are the same. If this is throughout the
> whole C* code, this partitioner is not practical.
>
> Or have your tried something similar?
>
> I would appreciate your feedback!
>
> Thanks,
> Takenori
>