You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Clement Honore <ho...@gmail.com> on 2012/10/01 10:45:58 UTC

Re: Help for creating a custom partitioner

Hi,

thanks for your answer.

We plan to use manual indexing too (with native C* indexing for other
cases).
So, for one index, we will get plenty of FK and a MultiGet call to get all
the associated entities, with RP, would then spread all the cluster.
As we don't know the cluster size yet, and as it's expected to grow at an
unknown rate, we are thinking about alternatives, now, for scalability.

But, to tell the truth, so far, we have not done performance tests.
But as the choice of a partitioner is the first C* cornerstone, we are
already thinking about a new partitioner.
We are planning tests "random vs custom partitioner" => so, my questions
for creating, first, another one.

AFAIS, your partitioner (the higher bits of the hash from hashing the
category, and the lower bits of the hash from hashing the document id) will
put all the docs of a category in (in average) 1 node. Quite interesting,
thanks!
I could add such a partitioner to my test suite.

But, why not just hashing the "category" part of the row key ?
With such partitioner, as said before, many rows on *one* node are going to
have the same hash value.
- if it hurts Cassandra behavior/performance => I am curious to know why.
Anyway, in that case, I see your partitioner, so far, as the best answer to
my wishes!
- if it's NOT hurting Cassandra behavior/performance => it sounds, then, an
optimal partitioner for our needs.

Any idea about Cassandra behavior with such hash (category-only)
partitioner ?

Regards,
Clément

2012/9/28 Tim Wintle <ti...@gmail.com>

> On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote:
> > Hi,****
> >
> > ** **
> >
> > I have hierarchical data.****
> >
> > I'm storing them in CF with rowkey somewhat like (category, doc id), and
> > plenty of columns for a doc definition.****
> >
> > ** **
> >
> > I have hierarchical data traversal too.****
> >
> > The user just chooses one category, and then, interact with docs
> belonging
> > only to this category.****
> >
> > ** **
> >
> > 1) If I use RandomPartitioner, all docs could be spread within all nodes
> in
> > the cluster => bad performance.****
> >
> > ** **
> >
> > 2) Using RandomPartitioner, an alternative design could be
> rowkey=category
> > and column name=(doc id, prop name)****
> >
> > I don't want it because I need fixed column names for indexing purposes,
> > and the "category" is quite a lonnnng string.****
> >
> > ** **
> >
> > 3) Then, I want to define a new partitioner for my rowkey (category, doc
> > id), doing MD5 only for the "category" part.****
> >
> > ** **
> >
> > The question is : with such partitioner, many rows on *one* node are
> going
> > to have the same MD5 value, as a result of this new partitioner.****
>
> If you do decide writing having rows on the same node is what you want,
> then you could take the higher bits of the hash from hashing the
> category, and the lower bits of the hash from hashing the document id.
>
> That would mean documents in a category would be close to each other in
> the ring - while being unlikely to share the same hash.
>
>
> However, If you're doing this then all reads/writes to the category are
> going to be to a single machine. That's not going to spread the load
> across the cluster very well as I assume a few categories are going to
> be far more popular than others.
>
> Have you tested that you actually get bad performance from
> RandomPartitioner?
>
> Tim
>
>

Re: Help for creating a custom partitioner

Posted by Tim Wintle <ti...@gmail.com>.

On Mon, 2012-10-01 at 10:45 +0200, Clement Honore wrote:
> We plan to use manual indexing too (with native C* indexing for other
> cases).
> So, for one index, we will get plenty of FK and a MultiGet call to get all
> the associated entities, with RP, would then spread all the cluster.
> As we don't know the cluster size yet, and as it's expected to grow at an
> unknown rate, we are thinking about alternatives, now, for scalability.
> 
> But, to tell the truth, so far, we have not done performance tests.
> But as the choice of a partitioner is the first C* cornerstone, we are
> already thinking about a new partitioner.
> We are planning tests "random vs custom partitioner" => so, my questions
> for creating, first, another one.
> 
> AFAIS, your partitioner (the higher bits of the hash from hashing the
> category, and the lower bits of the hash from hashing the document id) will
> put all the docs of a category in (in average) 1 node. Quite interesting,
> thanks!
> I could add such a partitioner to my test suite.
> 
> But, why not just hashing the "category" part of the row key ?
> With such partitioner, as said before, many rows on *one* node are going to
> have the same hash value.
> - if it hurts Cassandra behavior/performance => I am curious to know why.
> Anyway, in that case, I see your partitioner, so far, as the best answer to
> my wishes!
> - if it's NOT hurting Cassandra behavior/performance => it sounds, then, an
> optimal partitioner for our needs.
> 
> Any idea about Cassandra behavior with such hash (category-only)
> partitioner ?

I honestly don't know the code well enough - but I have always assumed
(perhaps incorrectly) that the whole SSTable / Memtable system was
sorted on the hash value rather than the key, so that range queries are
efficient - so if all items on a node have the same hash you would get
awful performance for (at least) reading specific rows from disk. I
could be wrong in my assumptions.

Certainly having lots of hash collisions is unusual behaviour - I don't
imagine the time behaviour has been tested against that situation
closely.

If you haven't yet tested it, then I'm not sure why you assume that
accesses from a single machine would be faster than from documents
spread around the ring - ethernet is fast, and if you're going to have
to do disk seeks to get any of this data then you can run the seeks in
parallel across a large number of spindles by spreading the load around
the cluster.

It also adds extra load onto machines handling popular categories -
assuming the number of categories is significantly smaller than the
number of documents that could make a major difference to latency.

Tim


> 
> Regards,
> Clément
> 
> 2012/9/28 Tim Wintle <ti...@gmail.com>
> 
> > On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote:
> > > Hi,****
> > >
> > > ** **
> > >
> > > I have hierarchical data.****
> > >
> > > I'm storing them in CF with rowkey somewhat like (category, doc id), and
> > > plenty of columns for a doc definition.****
> > >
> > > ** **
> > >
> > > I have hierarchical data traversal too.****
> > >
> > > The user just chooses one category, and then, interact with docs
> > belonging
> > > only to this category.****
> > >
> > > ** **
> > >
> > > 1) If I use RandomPartitioner, all docs could be spread within all nodes
> > in
> > > the cluster => bad performance.****
> > >
> > > ** **
> > >
> > > 2) Using RandomPartitioner, an alternative design could be
> > rowkey=category
> > > and column name=(doc id, prop name)****
> > >
> > > I don't want it because I need fixed column names for indexing purposes,
> > > and the "category" is quite a lonnnng string.****
> > >
> > > ** **
> > >
> > > 3) Then, I want to define a new partitioner for my rowkey (category, doc
> > > id), doing MD5 only for the "category" part.****
> > >
> > > ** **
> > >
> > > The question is : with such partitioner, many rows on *one* node are
> > going
> > > to have the same MD5 value, as a result of this new partitioner.****
> >
> > If you do decide writing having rows on the same node is what you want,
> > then you could take the higher bits of the hash from hashing the
> > category, and the lower bits of the hash from hashing the document id.
> >
> > That would mean documents in a category would be close to each other in
> > the ring - while being unlikely to share the same hash.
> >
> >
> > However, If you're doing this then all reads/writes to the category are
> > going to be to a single machine. That's not going to spread the load
> > across the cluster very well as I assume a few categories are going to
> > be far more popular than others.
> >
> > Have you tested that you actually get bad performance from
> > RandomPartitioner?
> >
> > Tim
> >
> >

Re: Help for creating a custom partitioner

Posted by "Hiller, Dean" <De...@nrel.gov>.

I would be surprised if random partitioner hurt your performance.  In general, doing performance tests on a 6 node cluster with PlayOrm Scalable SQL, even joins queries ended up faster as the parallel disks of reading all the rows was way faster than reading from a single machine(remember, one disk bottleneck can really hurt which is why random partitioner works out so well).

Later,
Dean

From: Clement Honore <ho...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Monday, October 1, 2012 2:45 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Help for creating a custom partitioner

Hi,

thanks for your answer.

We plan to use manual indexing too (with native C* indexing for other cases).
So, for one index, we will get plenty of FK and a MultiGet call to get all the associated entities, with RP, would then spread all the cluster.
As we don't know the cluster size yet, and as it's expected to grow at an unknown rate, we are thinking about alternatives, now, for scalability.

But, to tell the truth, so far, we have not done performance tests.
But as the choice of a partitioner is the first C* cornerstone, we are already thinking about a new partitioner.
We are planning tests "random vs custom partitioner" => so, my questions for creating, first, another one.

AFAIS, your partitioner (the higher bits of the hash from hashing the category, and the lower bits of the hash from hashing the document id) will put all the docs of a category in (in average) 1 node. Quite interesting, thanks!
I could add such a partitioner to my test suite.

But, why not just hashing the "category" part of the row key ?
With such partitioner, as said before, many rows on *one* node are going to have the same hash value.
- if it hurts Cassandra behavior/performance => I am curious to know why. Anyway, in that case, I see your partitioner, so far, as the best answer to my wishes!
- if it's NOT hurting Cassandra behavior/performance => it sounds, then, an optimal partitioner for our needs.

Any idea about Cassandra behavior with such hash (category-only) partitioner ?

Regards,
Clément

2012/9/28 Tim Wintle <ti...@gmail.com>>
On Fri, 2012-09-28 at 18:20 +0200, Clement Honore wrote:
> Hi,****
>
> ** **
>
> I have hierarchical data.****
>
> I'm storing them in CF with rowkey somewhat like (category, doc id), and
> plenty of columns for a doc definition.****
>
> ** **
>
> I have hierarchical data traversal too.****
>
> The user just chooses one category, and then, interact with docs belonging
> only to this category.****
>
> ** **
>
> 1) If I use RandomPartitioner, all docs could be spread within all nodes in
> the cluster => bad performance.****
>
> ** **
>
> 2) Using RandomPartitioner, an alternative design could be rowkey=category
> and column name=(doc id, prop name)****
>
> I don't want it because I need fixed column names for indexing purposes,
> and the "category" is quite a lonnnng string.****
>
> ** **
>
> 3) Then, I want to define a new partitioner for my rowkey (category, doc
> id), doing MD5 only for the "category" part.****
>
> ** **
>
> The question is : with such partitioner, many rows on *one* node are going
> to have the same MD5 value, as a result of this new partitioner.****

If you do decide writing having rows on the same node is what you want,
then you could take the higher bits of the hash from hashing the
category, and the lower bits of the hash from hashing the document id.

That would mean documents in a category would be close to each other in
the ring - while being unlikely to share the same hash.

However, If you're doing this then all reads/writes to the category are
going to be to a single machine. That's not going to spread the load
across the cluster very well as I assume a few categories are going to
be far more popular than others.

Have you tested that you actually get bad performance from
RandomPartitioner?

Tim