You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Stan Rosenberg <st...@gmail.com> on 2013/01/10 20:59:53 UTC

partitioning

Hi,

I apologize if this question has been addressed before.  We are
currently evaluating kafka for our high volume data ingestion
infrastructure.
I would like to understand why consistent hashing was not implemented
given its inherent ability to dynamically balance the load across
brokers.
The current scheme if I understand correctly is to compute a hash on
the message key (default or user-given) modulo the number of brokers.
This is bound to yield poor (broker) load distribution in the face of failure.

Thanks in advance,

stan

Re: partitioning

Posted by Maxime Brugidou <ma...@gmail.com>.

I'm not sure what design doc you are looking at (v1 probably?, v3 is here:
https://cwiki.apache.org/KAFKA/kafka-detailed-replication-design-v3.html )
but If I understand correctly, consistent hashing for partitioning is more
about remapping as few keys as possible when adding/deleting partitions,
which you can implement already with a custom partitioner by doing
partition_id = abs(num_partitions * hash(key)/hash_space). But the added
value is mitigated by the fact that if you add/delete partitions you
already destroy your partitioning and make it kind of useless?

However if you are talking about partition assignment to brokers that's
another part, and I guess the current state of things is just a simple
round robin to assign partitions on topic creation (to be confirmed?). It
could be interesting to have partitions assigned to brokers with some
consistent hashing so that adding a broker requires moving as few
partitions as possible. That process is done manually as of now using a
ReassignPartition command, and it could be automated with a tool (provided
that you over-partition as Jun recommends, so that you have some
granularity and the load gets spread evenly over brokers).

On Mon, Jan 14, 2013 at 5:04 PM, Stan Rosenberg <st...@gmail.com>wrote:

> On Fri, Jan 11, 2013 at 12:37 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Our current partitioning strategy is to mod key by # of partitions, not #
> > brokers. For better balancing partitions over brokers, one simple
> strategy
> > is to over partition, i.e., you have a few times of more partitions than
> > brokers. That way, if one adds more brokers overtime, you can just move
> > some existing partitions to the new broker.
> >
> > Consistent hashing requires changing # partitions dynamically.However,
> for
> > some applications, they may prefer not to change partitions.
> >
>
> > What's your use case for consistent hashing?
> >
>
> My use case is essentially the same as above, i.e., dynamic load balancing.
>  I now understand why the current partitioning strategy is used as opposed
> to consistent hashing; partition "stickiness"
> is definitely to be desired for the sake of moving computation to data.
> However, the dynamic rebalancing as described in "Kafka Replication
> Design", sect. 1.2 looks very similar to what's typically achieved by using
> consistent hashing.
> Is this rebalancing implemented in 0.8 or am I reading the now obsolete
> documentation? :)  (If yes, could you please refer me to the code.)
>
> Thanks,
>
> stan
>

Re: partitioning

Posted by Stan Rosenberg <st...@gmail.com>.

On Fri, Jan 11, 2013 at 12:37 AM, Jun Rao <ju...@gmail.com> wrote:

> Our current partitioning strategy is to mod key by # of partitions, not #
> brokers. For better balancing partitions over brokers, one simple strategy
> is to over partition, i.e., you have a few times of more partitions than
> brokers. That way, if one adds more brokers overtime, you can just move
> some existing partitions to the new broker.
>
> Consistent hashing requires changing # partitions dynamically.However, for
> some applications, they may prefer not to change partitions.
>

> What's your use case for consistent hashing?
>

My use case is essentially the same as above, i.e., dynamic load balancing.
 I now understand why the current partitioning strategy is used as opposed
to consistent hashing; partition "stickiness"
is definitely to be desired for the sake of moving computation to data.
However, the dynamic rebalancing as described in "Kafka Replication
Design", sect. 1.2 looks very similar to what's typically achieved by using
consistent hashing.
Is this rebalancing implemented in 0.8 or am I reading the now obsolete
documentation? :)  (If yes, could you please refer me to the code.)

Thanks,

stan

Re: partitioning

Posted by Jun Rao <ju...@gmail.com>.

Our current partitioning strategy is to mod key by # of partitions, not #
brokers. For better balancing partitions over brokers, one simple strategy
is to over partition, i.e., you have a few times of more partitions than
brokers. That way, if one adds more brokers overtime, you can just move
some existing partitions to the new broker.

Consistent hashing requires changing # partitions dynamically.However, for
some applications, they may prefer not to change partitions.

What's your use case for consistent hashing?

Thanks,

Jun

On Thu, Jan 10, 2013 at 11:59 AM, Stan Rosenberg
<st...@gmail.com>wrote:

> Hi,
>
> I apologize if this question has been addressed before.  We are
> currently evaluating kafka for our high volume data ingestion
> infrastructure.
> I would like to understand why consistent hashing was not implemented
> given its inherent ability to dynamically balance the load across
> brokers.
> The current scheme if I understand correctly is to compute a hash on
> the message key (default or user-given) modulo the number of brokers.
> This is bound to yield poor (broker) load distribution in the face of
> failure.
>
> Thanks in advance,
>
> stan
>