You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Matt Andruff <ma...@gmail.com> on 2017/08/16 13:40:59 UTC

New Partition Strategy for Even Disk Usage

Good Day,

I'm looking for someone to poke holes in my theory.

I want to balance my disk usage across brokers.  I want to maintain order
per partition.  Yes there are tools but they require manual intervention.
What if created a custom partition strategy.  The strategy is to take the
existing partitioning strategy but add the ability to rotate the writing of
partitions by 1.

If                 then          then          ect...

A -> A         A->B         A->C
B -> B         B->C         B->A
C -> C        C->A         C->B

The idea is to simply rotate the partition after some measure is
reached.(Time sounds like the most likely way to do it, but to do it the
'correct way' to avoid race conditions would have to be part of the
strategy.)   This should help ensure that the ordering of the partitions is
maintained, but probably requires extra logic on the consumer side to undo
the partitioning strategy and take advantage of the ordering.  This should
help with a more balanced disk usage on a per topic level.

Has anyone tried this?  Is there a pitfall I should consider?

RE: New Partition Strategy for Even Disk Usage

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
What sort of skew do you expect.  For example do you expect one key to have 1000x as many messages as others?

The consumer API allows you to pick a partition.  So if you know that you have N partition groups  then you could setup N consumers each pull from one partition in the group.  You could put a special message on the topic to tell the consumer to move to the next partition.  The hardest part will be having your producers all switch over at the same time and having one of them put a marker message on the topic.  Using time will have all sorts of race conditions.

-----Original Message-----
From: Matt Andruff [mailto:matt.andruff@gmail.com]
Sent: Wednesday, August 16, 2017 8:41 AM
To: users@kafka.apache.org
Subject: New Partition Strategy for Even Disk Usage

Good Day,

I'm looking for someone to poke holes in my theory.

I want to balance my disk usage across brokers.  I want to maintain order per partition.  Yes there are tools but they require manual intervention.
What if created a custom partition strategy.  The strategy is to take the existing partitioning strategy but add the ability to rotate the writing of partitions by 1.

If                 then          then          ect...

A -> A         A->B         A->C
B -> B         B->C         B->A
C -> C        C->A         C->B

The idea is to simply rotate the partition after some measure is reached.(Time sounds like the most likely way to do it, but to do it the 'correct way' to avoid race conditions would have to be part of the
strategy.)   This should help ensure that the ordering of the partitions is
maintained, but probably requires extra logic on the consumer side to undo the partitioning strategy and take advantage of the ordering.  This should help with a more balanced disk usage on a per topic level.

Has anyone tried this?  Is there a pitfall I should consider?
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.