You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by jaaz jozz <ja...@gmail.com> on 2019/01/27 12:20:04 UTC

How to balance messages in kafka topics with newly added partitions?

Hello,

I have kafka cluster with certain topic that had too few partitions, so a
large backlog of messages was collected. After i added additional
partitions, only the newly messages balanced between all the new partitions.

What is the preferred way to balance the "old" backlog of messages inside
the original partitions across all the new partitions?

I thought of reading and writing again all the messages backlog to this
topic and update the offsets accordingly, but it will make duplication of
messages if a new consumer group will start consuming from the beginning of
this topic.

How can i solve this?

Thanks.

Re: How to balance messages in kafka topics with newly added partitions?

Posted by Hans Jespersen <ha...@confluent.io>.
Yes but I find this even easier to do with KSQL. 

CREATE STREAM OUTPUTTOPIC AS SELECT * FROM INPUTTOPIC;

There are similar examples like this that also filter messages while copying, or change the message format while copying on the KSQL Recipe page here
https://www.confluent.io/stream-processing-cookbook/

There is even an example for repartitioning topics using the PARTITIONS parameter.
CREATE STREAM clickstream_new WITH (PARTITIONS=5) AS SELECT * from clickstream_raw;
-hans

> On Jan 27, 2019, at 9:24 AM, Ryanne Dolan <ry...@gmail.com> wrote:
> 
> You can use MirrorMaker to copy data between topics.
> 
> Ryanne
> 
>> On Sun, Jan 27, 2019, 7:12 AM jaaz jozz <jazzlofi2@gmail.com wrote:
>> 
>> Thanks, Sönke
>> Is there any available kafka tool to move messages between topics?
>> 
>> On Sun, Jan 27, 2019 at 2:40 PM Sönke Liebau
>> <so...@opencore.com.invalid> wrote:
>> 
>>> Hi Jazz,
>>> 
>>> I'm afraid the only way of rebalancing old messages is indeed to
>>> rewrite them to the topic - thus creating duplication.
>>> Once a message has been written to a partition by Kafka this
>>> assignment is final, there is no way of moving it to another
>>> partition.
>>> 
>>> Changing the partition count of topics at a later time can be a huge
>>> headache, if you depend on partitioning. For this exact reason the
>>> general recommendation is to overpartition your topics a little when
>>> creating them, so that you can add consumers as the data volume
>>> increases.
>>> 
>>> In your case the best solution might be to delete and then recreate
>>> the topic with more partitions. Now you can rewrite all your data and
>>> it will result in a clean partitioning.
>>> 
>>> Hope this helps a little, feel free to get back to us if you have more
>>> questions!
>>> 
>>> Best regards,
>>> Sönke
>>> 
>>>> On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz <ja...@gmail.com> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I have kafka cluster with certain topic that had too few partitions,
>> so a
>>>> large backlog of messages was collected. After i added additional
>>>> partitions, only the newly messages balanced between all the new
>>> partitions.
>>>> 
>>>> What is the preferred way to balance the "old" backlog of messages
>> inside
>>>> the original partitions across all the new partitions?
>>>> 
>>>> I thought of reading and writing again all the messages backlog to this
>>>> topic and update the offsets accordingly, but it will make duplication
>> of
>>>> messages if a new consumer group will start consuming from the
>> beginning
>>> of
>>>> this topic.
>>>> 
>>>> How can i solve this?
>>>> 
>>>> Thanks.
>>> 
>>> 
>>> 
>>> --
>>> Sönke Liebau
>>> Partner
>>> Tel. +49 179 7940878
>>> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>>> 
>> 

Re: How to balance messages in kafka topics with newly added partitions?

Posted by Ryanne Dolan <ry...@gmail.com>.
You can use MirrorMaker to copy data between topics.

Ryanne

On Sun, Jan 27, 2019, 7:12 AM jaaz jozz <jazzlofi2@gmail.com wrote:

> Thanks, Sönke
> Is there any available kafka tool to move messages between topics?
>
> On Sun, Jan 27, 2019 at 2:40 PM Sönke Liebau
> <so...@opencore.com.invalid> wrote:
>
> > Hi Jazz,
> >
> > I'm afraid the only way of rebalancing old messages is indeed to
> > rewrite them to the topic - thus creating duplication.
> > Once a message has been written to a partition by Kafka this
> > assignment is final, there is no way of moving it to another
> > partition.
> >
> > Changing the partition count of topics at a later time can be a huge
> > headache, if you depend on partitioning. For this exact reason the
> > general recommendation is to overpartition your topics a little when
> > creating them, so that you can add consumers as the data volume
> > increases.
> >
> > In your case the best solution might be to delete and then recreate
> > the topic with more partitions. Now you can rewrite all your data and
> > it will result in a clean partitioning.
> >
> > Hope this helps a little, feel free to get back to us if you have more
> > questions!
> >
> > Best regards,
> > Sönke
> >
> > On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz <ja...@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > I have kafka cluster with certain topic that had too few partitions,
> so a
> > > large backlog of messages was collected. After i added additional
> > > partitions, only the newly messages balanced between all the new
> > partitions.
> > >
> > > What is the preferred way to balance the "old" backlog of messages
> inside
> > > the original partitions across all the new partitions?
> > >
> > > I thought of reading and writing again all the messages backlog to this
> > > topic and update the offsets accordingly, but it will make duplication
> of
> > > messages if a new consumer group will start consuming from the
> beginning
> > of
> > > this topic.
> > >
> > > How can i solve this?
> > >
> > > Thanks.
> >
> >
> >
> > --
> > Sönke Liebau
> > Partner
> > Tel. +49 179 7940878
> > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
> >
>

Re: How to balance messages in kafka topics with newly added partitions?

Posted by jaaz jozz <ja...@gmail.com>.
Thanks, Sönke
Is there any available kafka tool to move messages between topics?

On Sun, Jan 27, 2019 at 2:40 PM Sönke Liebau
<so...@opencore.com.invalid> wrote:

> Hi Jazz,
>
> I'm afraid the only way of rebalancing old messages is indeed to
> rewrite them to the topic - thus creating duplication.
> Once a message has been written to a partition by Kafka this
> assignment is final, there is no way of moving it to another
> partition.
>
> Changing the partition count of topics at a later time can be a huge
> headache, if you depend on partitioning. For this exact reason the
> general recommendation is to overpartition your topics a little when
> creating them, so that you can add consumers as the data volume
> increases.
>
> In your case the best solution might be to delete and then recreate
> the topic with more partitions. Now you can rewrite all your data and
> it will result in a clean partitioning.
>
> Hope this helps a little, feel free to get back to us if you have more
> questions!
>
> Best regards,
> Sönke
>
> On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz <ja...@gmail.com> wrote:
> >
> > Hello,
> >
> > I have kafka cluster with certain topic that had too few partitions, so a
> > large backlog of messages was collected. After i added additional
> > partitions, only the newly messages balanced between all the new
> partitions.
> >
> > What is the preferred way to balance the "old" backlog of messages inside
> > the original partitions across all the new partitions?
> >
> > I thought of reading and writing again all the messages backlog to this
> > topic and update the offsets accordingly, but it will make duplication of
> > messages if a new consumer group will start consuming from the beginning
> of
> > this topic.
> >
> > How can i solve this?
> >
> > Thanks.
>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>

Re: How to balance messages in kafka topics with newly added partitions?

Posted by Sönke Liebau <so...@opencore.com.INVALID>.
Hi Jazz,

I'm afraid the only way of rebalancing old messages is indeed to
rewrite them to the topic - thus creating duplication.
Once a message has been written to a partition by Kafka this
assignment is final, there is no way of moving it to another
partition.

Changing the partition count of topics at a later time can be a huge
headache, if you depend on partitioning. For this exact reason the
general recommendation is to overpartition your topics a little when
creating them, so that you can add consumers as the data volume
increases.

In your case the best solution might be to delete and then recreate
the topic with more partitions. Now you can rewrite all your data and
it will result in a clean partitioning.

Hope this helps a little, feel free to get back to us if you have more
questions!

Best regards,
Sönke

On Sun, Jan 27, 2019 at 1:21 PM jaaz jozz <ja...@gmail.com> wrote:
>
> Hello,
>
> I have kafka cluster with certain topic that had too few partitions, so a
> large backlog of messages was collected. After i added additional
> partitions, only the newly messages balanced between all the new partitions.
>
> What is the preferred way to balance the "old" backlog of messages inside
> the original partitions across all the new partitions?
>
> I thought of reading and writing again all the messages backlog to this
> topic and update the offsets accordingly, but it will make duplication of
> messages if a new consumer group will start consuming from the beginning of
> this topic.
>
> How can i solve this?
>
> Thanks.



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany