You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Lehar Jain <le...@media.net.INVALID> on 2022/10/25 10:55:35 UTC

Balancing traffic between multiple directories

Hey,

We run Kafka brokers with multiple log directories. I wanted to know how
Kafka balances traffic between various directories. Can we have our own
strategy to distribute different partitions to different directories. As
currently, we are facing an imbalance in sizes of the aforementioned
directories, some directories have a lot of empty space whereas others are
getting filled quickly.


Regards

Re: Balancing traffic between multiple directories

Posted by Alex Craig <al...@gmail.com>.
I don't think the Confluent self-balancing feature works if you have your
broker data in multiple directories anyway - it's expecting a single dir
per broker and will try and keep the data balanced between brokers.  Also
just as an aside, I'm not sure there's much value in using multiple
directories.  I assume you have these mapped to individual disks?  I'd be
curious to hear if you actually get any performance benefit out of that,
especially when weighed against the increased likelihood of disk failure.
I realize that doesn't help your current problem, more of a
question/discussion point I guess.  I think your only option for moving the
data around is the kafka-reassign-partitions.sh script.

Alex C

On Thu, Oct 27, 2022 at 9:30 AM Andrew Grant <ag...@confluent.io.invalid>
wrote:

> There's Cruise Control, https://github.com/linkedin/cruise-control, which
> is open-source and could help with automated balancing.
>
> On Thu, Oct 27, 2022 at 10:26 AM <ga...@hotmail.co.uk> wrote:
>
> > Auto rebalancing is a very important feature to run Kafka in a production
> > environment. Given the confluent already have this feature, are there any
> > space that the open source version could have this feature as well?
> > Or, is it the idea that opensource version shouldn't be used in a high
> > load production environment?
> >
> > ________________________________
> > 发件人: sunil chaudhari <su...@gmail.com>
> > 发送时间: 2022年10月27日 3:11
> > 收件人: users@kafka.apache.org <us...@kafka.apache.org>
> > 主题: Re: Balancing traffic between multiple directories
> >
> > Hi Lehar,
> > You are right. There is no better way in open source Kafka.
> > However confluent has something called as Auto Rebalancing feature.
> > Can you check if there is free version with this feature?
> >
> > It start balancing of  brokers automatically when it see there is uneven
> > distribution of partitions.
> >
> > Regards,
> > Sunil.
> > On Wed, 26 Oct 2022 at 12:03 PM, Lehar Jain <le...@media.net.invalid>
> > wrote:
> >
> > > Hey Andrew,
> > >
> > > Thanks for the reply. Currently, we are using the same method as you
> > > described. Wanted to make sure if there is a better way.
> > >
> > > It seems there isn't currently. So we will keep using this only.
> > >
> > > On Tue, Oct 25, 2022 at 7:23 PM Andrew Grant
> <agrant@confluent.io.invalid
> > >
> > > wrote:
> > >
> > > > Hey Lehar,
> > > >
> > > >
> > > > I don’t think there’s a way to control this during topic creation. I
> > just
> > > > took a look through
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala
> > > > and it does appear partition assignment does not account for each
> > > broker’s
> > > > different log directories. I also took a look at the kafka-topics.sh
> > > script
> > > > and it has a --replica-assignment argument but that looks to only
> allow
> > > > specifying brokers. During topic creation, once a replica has been
> > > chosen I
> > > > think we then choose the directory with the fewest number of
> > partitions -
> > > > see
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L1192
> > > >
> > > >
> > > > What I think you can do is move existing partitions around with the
> > > > kafka-reassign-partitions.sh script. From running the command
> locally:
> > > >
> > > >
> > > > --reassignment-json-file <String:       The JSON file with the
> > partition
> > > >
> > > >   manual assignment json file path>       reassignment
> configurationThe
> > > > format
> > > >
> > > >                                           to use is -
> > > >
> > > >                                         {"partitions":
> > > >
> > > >                                         [{"topic": "foo",
> > > >
> > > >                                           "partition": 1,
> > > >
> > > >                                           "replicas": [1,2,3],
> > > >
> > > >                                           "log_dirs":
> > > > ["dir1","dir2","dir3"]
> > > >
> > > >                                           }],
> > > >
> > > >                                         "version":1
> > > >
> > > >                                         }
> > > >
> > > >                                         Note that "log_dirs" is
> > optional.
> > > > When
> > > >
> > > >                                           it is specified, its length
> > > must
> > > >
> > > >                                           equal the length of the
> > > replicas
> > > >
> > > >                                           list. The value in this
> list
> > > can
> > > > be
> > > >
> > > >                                           either "any" or the
> > absolution
> > > > path
> > > >
> > > >                                           of the log directory on the
> > > > broker.
> > > >
> > > >                                           If absolute log directory
> > path
> > > is
> > > >
> > > >                                           specified, the replica will
> > be
> > > > moved
> > > >
> > > >                                           to the specified log
> > directory
> > > on
> > > >
> > > >                                           the broker.
> > > >
> > > >
> > > > There’s the log_dirs field you can use in the JSON file to move
> > > partitions
> > > > between directories.
> > > >
> > > >
> > > > Hope that helps a bit.
> > > >
> > > >
> > > > Andrew
> > > >
> > > > On Tue, Oct 25, 2022 at 6:56 AM Lehar Jain <lehar.j@media.net.invalid
> >
> > > > wrote:
> > > >
> > > > > Hey,
> > > > >
> > > > > We run Kafka brokers with multiple log directories. I wanted to
> know
> > > how
> > > > > Kafka balances traffic between various directories. Can we have our
> > own
> > > > > strategy to distribute different partitions to different
> directories.
> > > As
> > > > > currently, we are facing an imbalance in sizes of the
> aforementioned
> > > > > directories, some directories have a lot of empty space whereas
> > others
> > > > are
> > > > > getting filled quickly.
> > > > >
> > > > >
> > > > > Regards
> > > > >
> > > >
> > >
> >
>

Re: Balancing traffic between multiple directories

Posted by Andrew Grant <ag...@confluent.io.INVALID>.
There's Cruise Control, https://github.com/linkedin/cruise-control, which
is open-source and could help with automated balancing.

On Thu, Oct 27, 2022 at 10:26 AM <ga...@hotmail.co.uk> wrote:

> Auto rebalancing is a very important feature to run Kafka in a production
> environment. Given the confluent already have this feature, are there any
> space that the open source version could have this feature as well?
> Or, is it the idea that opensource version shouldn't be used in a high
> load production environment?
>
> ________________________________
> 发件人: sunil chaudhari <su...@gmail.com>
> 发送时间: 2022年10月27日 3:11
> 收件人: users@kafka.apache.org <us...@kafka.apache.org>
> 主题: Re: Balancing traffic between multiple directories
>
> Hi Lehar,
> You are right. There is no better way in open source Kafka.
> However confluent has something called as Auto Rebalancing feature.
> Can you check if there is free version with this feature?
>
> It start balancing of  brokers automatically when it see there is uneven
> distribution of partitions.
>
> Regards,
> Sunil.
> On Wed, 26 Oct 2022 at 12:03 PM, Lehar Jain <le...@media.net.invalid>
> wrote:
>
> > Hey Andrew,
> >
> > Thanks for the reply. Currently, we are using the same method as you
> > described. Wanted to make sure if there is a better way.
> >
> > It seems there isn't currently. So we will keep using this only.
> >
> > On Tue, Oct 25, 2022 at 7:23 PM Andrew Grant <agrant@confluent.io.invalid
> >
> > wrote:
> >
> > > Hey Lehar,
> > >
> > >
> > > I don’t think there’s a way to control this during topic creation. I
> just
> > > took a look through
> > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala
> > > and it does appear partition assignment does not account for each
> > broker’s
> > > different log directories. I also took a look at the kafka-topics.sh
> > script
> > > and it has a --replica-assignment argument but that looks to only allow
> > > specifying brokers. During topic creation, once a replica has been
> > chosen I
> > > think we then choose the directory with the fewest number of
> partitions -
> > > see
> > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L1192
> > >
> > >
> > > What I think you can do is move existing partitions around with the
> > > kafka-reassign-partitions.sh script. From running the command locally:
> > >
> > >
> > > --reassignment-json-file <String:       The JSON file with the
> partition
> > >
> > >   manual assignment json file path>       reassignment configurationThe
> > > format
> > >
> > >                                           to use is -
> > >
> > >                                         {"partitions":
> > >
> > >                                         [{"topic": "foo",
> > >
> > >                                           "partition": 1,
> > >
> > >                                           "replicas": [1,2,3],
> > >
> > >                                           "log_dirs":
> > > ["dir1","dir2","dir3"]
> > >
> > >                                           }],
> > >
> > >                                         "version":1
> > >
> > >                                         }
> > >
> > >                                         Note that "log_dirs" is
> optional.
> > > When
> > >
> > >                                           it is specified, its length
> > must
> > >
> > >                                           equal the length of the
> > replicas
> > >
> > >                                           list. The value in this list
> > can
> > > be
> > >
> > >                                           either "any" or the
> absolution
> > > path
> > >
> > >                                           of the log directory on the
> > > broker.
> > >
> > >                                           If absolute log directory
> path
> > is
> > >
> > >                                           specified, the replica will
> be
> > > moved
> > >
> > >                                           to the specified log
> directory
> > on
> > >
> > >                                           the broker.
> > >
> > >
> > > There’s the log_dirs field you can use in the JSON file to move
> > partitions
> > > between directories.
> > >
> > >
> > > Hope that helps a bit.
> > >
> > >
> > > Andrew
> > >
> > > On Tue, Oct 25, 2022 at 6:56 AM Lehar Jain <le...@media.net.invalid>
> > > wrote:
> > >
> > > > Hey,
> > > >
> > > > We run Kafka brokers with multiple log directories. I wanted to know
> > how
> > > > Kafka balances traffic between various directories. Can we have our
> own
> > > > strategy to distribute different partitions to different directories.
> > As
> > > > currently, we are facing an imbalance in sizes of the aforementioned
> > > > directories, some directories have a lot of empty space whereas
> others
> > > are
> > > > getting filled quickly.
> > > >
> > > >
> > > > Regards
> > > >
> > >
> >
>

回复: Balancing traffic between multiple directories

Posted by ga...@hotmail.co.uk.
Auto rebalancing is a very important feature to run Kafka in a production environment. Given the confluent already have this feature, are there any space that the open source version could have this feature as well?
Or, is it the idea that opensource version shouldn't be used in a high load production environment?

________________________________
发件人: sunil chaudhari <su...@gmail.com>
发送时间: 2022年10月27日 3:11
收件人: users@kafka.apache.org <us...@kafka.apache.org>
主题: Re: Balancing traffic between multiple directories

Hi Lehar,
You are right. There is no better way in open source Kafka.
However confluent has something called as Auto Rebalancing feature.
Can you check if there is free version with this feature?

It start balancing of  brokers automatically when it see there is uneven
distribution of partitions.

Regards,
Sunil.
On Wed, 26 Oct 2022 at 12:03 PM, Lehar Jain <le...@media.net.invalid>
wrote:

> Hey Andrew,
>
> Thanks for the reply. Currently, we are using the same method as you
> described. Wanted to make sure if there is a better way.
>
> It seems there isn't currently. So we will keep using this only.
>
> On Tue, Oct 25, 2022 at 7:23 PM Andrew Grant <ag...@confluent.io.invalid>
> wrote:
>
> > Hey Lehar,
> >
> >
> > I don’t think there’s a way to control this during topic creation. I just
> > took a look through
> >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala
> > and it does appear partition assignment does not account for each
> broker’s
> > different log directories. I also took a look at the kafka-topics.sh
> script
> > and it has a --replica-assignment argument but that looks to only allow
> > specifying brokers. During topic creation, once a replica has been
> chosen I
> > think we then choose the directory with the fewest number of partitions -
> > see
> >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L1192
> >
> >
> > What I think you can do is move existing partitions around with the
> > kafka-reassign-partitions.sh script. From running the command locally:
> >
> >
> > --reassignment-json-file <String:       The JSON file with the partition
> >
> >   manual assignment json file path>       reassignment configurationThe
> > format
> >
> >                                           to use is -
> >
> >                                         {"partitions":
> >
> >                                         [{"topic": "foo",
> >
> >                                           "partition": 1,
> >
> >                                           "replicas": [1,2,3],
> >
> >                                           "log_dirs":
> > ["dir1","dir2","dir3"]
> >
> >                                           }],
> >
> >                                         "version":1
> >
> >                                         }
> >
> >                                         Note that "log_dirs" is optional.
> > When
> >
> >                                           it is specified, its length
> must
> >
> >                                           equal the length of the
> replicas
> >
> >                                           list. The value in this list
> can
> > be
> >
> >                                           either "any" or the absolution
> > path
> >
> >                                           of the log directory on the
> > broker.
> >
> >                                           If absolute log directory path
> is
> >
> >                                           specified, the replica will be
> > moved
> >
> >                                           to the specified log directory
> on
> >
> >                                           the broker.
> >
> >
> > There’s the log_dirs field you can use in the JSON file to move
> partitions
> > between directories.
> >
> >
> > Hope that helps a bit.
> >
> >
> > Andrew
> >
> > On Tue, Oct 25, 2022 at 6:56 AM Lehar Jain <le...@media.net.invalid>
> > wrote:
> >
> > > Hey,
> > >
> > > We run Kafka brokers with multiple log directories. I wanted to know
> how
> > > Kafka balances traffic between various directories. Can we have our own
> > > strategy to distribute different partitions to different directories.
> As
> > > currently, we are facing an imbalance in sizes of the aforementioned
> > > directories, some directories have a lot of empty space whereas others
> > are
> > > getting filled quickly.
> > >
> > >
> > > Regards
> > >
> >
>

Re: Balancing traffic between multiple directories

Posted by sunil chaudhari <su...@gmail.com>.
Hi Lehar,
You are right. There is no better way in open source Kafka.
However confluent has something called as Auto Rebalancing feature.
Can you check if there is free version with this feature?

It start balancing of  brokers automatically when it see there is uneven
distribution of partitions.

Regards,
Sunil.
On Wed, 26 Oct 2022 at 12:03 PM, Lehar Jain <le...@media.net.invalid>
wrote:

> Hey Andrew,
>
> Thanks for the reply. Currently, we are using the same method as you
> described. Wanted to make sure if there is a better way.
>
> It seems there isn't currently. So we will keep using this only.
>
> On Tue, Oct 25, 2022 at 7:23 PM Andrew Grant <ag...@confluent.io.invalid>
> wrote:
>
> > Hey Lehar,
> >
> >
> > I don’t think there’s a way to control this during topic creation. I just
> > took a look through
> >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala
> > and it does appear partition assignment does not account for each
> broker’s
> > different log directories. I also took a look at the kafka-topics.sh
> script
> > and it has a --replica-assignment argument but that looks to only allow
> > specifying brokers. During topic creation, once a replica has been
> chosen I
> > think we then choose the directory with the fewest number of partitions -
> > see
> >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L1192
> >
> >
> > What I think you can do is move existing partitions around with the
> > kafka-reassign-partitions.sh script. From running the command locally:
> >
> >
> > --reassignment-json-file <String:       The JSON file with the partition
> >
> >   manual assignment json file path>       reassignment configurationThe
> > format
> >
> >                                           to use is -
> >
> >                                         {"partitions":
> >
> >                                         [{"topic": "foo",
> >
> >                                           "partition": 1,
> >
> >                                           "replicas": [1,2,3],
> >
> >                                           "log_dirs":
> > ["dir1","dir2","dir3"]
> >
> >                                           }],
> >
> >                                         "version":1
> >
> >                                         }
> >
> >                                         Note that "log_dirs" is optional.
> > When
> >
> >                                           it is specified, its length
> must
> >
> >                                           equal the length of the
> replicas
> >
> >                                           list. The value in this list
> can
> > be
> >
> >                                           either "any" or the absolution
> > path
> >
> >                                           of the log directory on the
> > broker.
> >
> >                                           If absolute log directory path
> is
> >
> >                                           specified, the replica will be
> > moved
> >
> >                                           to the specified log directory
> on
> >
> >                                           the broker.
> >
> >
> > There’s the log_dirs field you can use in the JSON file to move
> partitions
> > between directories.
> >
> >
> > Hope that helps a bit.
> >
> >
> > Andrew
> >
> > On Tue, Oct 25, 2022 at 6:56 AM Lehar Jain <le...@media.net.invalid>
> > wrote:
> >
> > > Hey,
> > >
> > > We run Kafka brokers with multiple log directories. I wanted to know
> how
> > > Kafka balances traffic between various directories. Can we have our own
> > > strategy to distribute different partitions to different directories.
> As
> > > currently, we are facing an imbalance in sizes of the aforementioned
> > > directories, some directories have a lot of empty space whereas others
> > are
> > > getting filled quickly.
> > >
> > >
> > > Regards
> > >
> >
>

Re: Balancing traffic between multiple directories

Posted by Lehar Jain <le...@media.net.INVALID>.
Hey Andrew,

Thanks for the reply. Currently, we are using the same method as you
described. Wanted to make sure if there is a better way.

It seems there isn't currently. So we will keep using this only.

On Tue, Oct 25, 2022 at 7:23 PM Andrew Grant <ag...@confluent.io.invalid>
wrote:

> Hey Lehar,
>
>
> I don’t think there’s a way to control this during topic creation. I just
> took a look through
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala
> and it does appear partition assignment does not account for each broker’s
> different log directories. I also took a look at the kafka-topics.sh script
> and it has a --replica-assignment argument but that looks to only allow
> specifying brokers. During topic creation, once a replica has been chosen I
> think we then choose the directory with the fewest number of partitions -
> see
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L1192
>
>
> What I think you can do is move existing partitions around with the
> kafka-reassign-partitions.sh script. From running the command locally:
>
>
> --reassignment-json-file <String:       The JSON file with the partition
>
>   manual assignment json file path>       reassignment configurationThe
> format
>
>                                           to use is -
>
>                                         {"partitions":
>
>                                         [{"topic": "foo",
>
>                                           "partition": 1,
>
>                                           "replicas": [1,2,3],
>
>                                           "log_dirs":
> ["dir1","dir2","dir3"]
>
>                                           }],
>
>                                         "version":1
>
>                                         }
>
>                                         Note that "log_dirs" is optional.
> When
>
>                                           it is specified, its length must
>
>                                           equal the length of the replicas
>
>                                           list. The value in this list can
> be
>
>                                           either "any" or the absolution
> path
>
>                                           of the log directory on the
> broker.
>
>                                           If absolute log directory path is
>
>                                           specified, the replica will be
> moved
>
>                                           to the specified log directory on
>
>                                           the broker.
>
>
> There’s the log_dirs field you can use in the JSON file to move partitions
> between directories.
>
>
> Hope that helps a bit.
>
>
> Andrew
>
> On Tue, Oct 25, 2022 at 6:56 AM Lehar Jain <le...@media.net.invalid>
> wrote:
>
> > Hey,
> >
> > We run Kafka brokers with multiple log directories. I wanted to know how
> > Kafka balances traffic between various directories. Can we have our own
> > strategy to distribute different partitions to different directories. As
> > currently, we are facing an imbalance in sizes of the aforementioned
> > directories, some directories have a lot of empty space whereas others
> are
> > getting filled quickly.
> >
> >
> > Regards
> >
>

Re: Balancing traffic between multiple directories

Posted by Andrew Grant <ag...@confluent.io.INVALID>.
Hey Lehar,


I don’t think there’s a way to control this during topic creation. I just
took a look through
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala
and it does appear partition assignment does not account for each broker’s
different log directories. I also took a look at the kafka-topics.sh script
and it has a --replica-assignment argument but that looks to only allow
specifying brokers. During topic creation, once a replica has been chosen I
think we then choose the directory with the fewest number of partitions -
see
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LogManager.scala#L1192


What I think you can do is move existing partitions around with the
kafka-reassign-partitions.sh script. From running the command locally:


--reassignment-json-file <String:       The JSON file with the partition

  manual assignment json file path>       reassignment configurationThe
format

                                          to use is -

                                        {"partitions":

                                        [{"topic": "foo",

                                          "partition": 1,

                                          "replicas": [1,2,3],

                                          "log_dirs": ["dir1","dir2","dir3"]

                                          }],

                                        "version":1

                                        }

                                        Note that "log_dirs" is optional.
When

                                          it is specified, its length must

                                          equal the length of the replicas

                                          list. The value in this list can
be

                                          either "any" or the absolution
path

                                          of the log directory on the
broker.

                                          If absolute log directory path is

                                          specified, the replica will be
moved

                                          to the specified log directory on

                                          the broker.


There’s the log_dirs field you can use in the JSON file to move partitions
between directories.


Hope that helps a bit.


Andrew

On Tue, Oct 25, 2022 at 6:56 AM Lehar Jain <le...@media.net.invalid>
wrote:

> Hey,
>
> We run Kafka brokers with multiple log directories. I wanted to know how
> Kafka balances traffic between various directories. Can we have our own
> strategy to distribute different partitions to different directories. As
> currently, we are facing an imbalance in sizes of the aforementioned
> directories, some directories have a lot of empty space whereas others are
> getting filled quickly.
>
>
> Regards
>