You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Péter Nagykátai <st...@gmail.com> on 2020/08/06 14:28:37 UTC

Kafka topic partition distributing evenly on disks

Hello,

I have a Kafka cluster with 3 brokers (v2.3.0) and each broker has 2 disks
attached. I added a new topic (heavyweight) and was surprised that even if
the topic has 15 partitions, those weren't distributed evenly on the disks.
Thus I got one disk that's almost empty and the other almost filled up. Is
there any way to have Kafka evenly distribute data on its disks?

Thank you!

Re: Kafka topic partition distributing evenly on disks

Posted by Ma...@cognizant.com.
Or manually you can move data dir  . I'm assuming you have  replica >1
Stop the kafka process on broker 1
Move 1 or 2  dir log from Disk 1 to disk 2
And start the kafka process

Wait for ISR sync

Then you can repeate this step again .

On 8/7/20, 6:45 AM, "William Reynolds" <wi...@instaclustr.com> wrote:

    [External]


    Hmm, that's odd, I am sure it was in the docs previously. Here is the
    KIP on it https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-113%253A%2BSupport%2Breplicas%2Bmovement%2Bbetween%2Blog%2Bdirectories&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321152477&amp;sdata=tTYPxMp%2FmZ9ufSXQqY%2FbDwAIAG4ZNxRrc7fFq3EEvSg%3D&amp;reserved=0
    Basically the reassignment json that you get looks like this from the
    initial generation and if you already have a realignment file you can
    just add the log dirs section to each partition entry

    {
      "version" : int,
      "partitions" : [
        {
          "topic" : str,
          "partition" : int,
          "replicas" : [int],
          "log_dirs" : [str]    <-- NEW. A log directory can be either
    "any", or a valid absolute path that begins with '/'. This is an
    optional filed. It is treated as an array of "any" if this field is
    not explicitly specified in the json file.
        },
        ...
      ]
    }

    Hope that helps
    William

    On 07/08/2020, Péter Nagykátai <st...@gmail.com> wrote:
    > Thank you William,
    >
    > I checked the doc and don't see any instructions regarding disks. Should I
    > simply "move around" the topics and Kafka will assign the topics evenly on
    > the two disks (per broker)? The current setup looks like this (for the
    > topic in question, 15 primary, replica partitions):
    >
    > Broker 1 - disk 1: 8 partition
    > Broker 1 - disk 2: 2 partition
    >
    > Broker 2 - disk 1: 8 partition
    > Broker 2 - disk 2: 2 partition
    >
    > Broker 3 - disk 1: 8 partition
    > Broker 3 - disk 2: 2 partition
    >
    > Thanks!
    >
    > On Fri, Aug 7, 2020 at 1:01 PM William Reynolds <
    > william.reynolds@instaclustr.com> wrote:
    >
    >> Hi Péter,
    >> Sounds like time to reassign the partitions you have across all the
    >> brokers/data dirs using the instructions from here
    >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2Fdocumentation%2F%23basic_ops_automigrate&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=yWH5xhV8GXsTAOubFU1QmkuMlChpx6DVNe%2BKPpe5bwk%3D&amp;reserved=0. That
    >> assumes that your partition strategy has somewhat evenly filled your
    >> partitions and given it may move all the partitions it could be a bit
    >> intensive so be sure to use the throttle option.
    >> Cheers
    >> William
    >>
    >> On 07/08/2020, Péter Nagykátai <st...@gmail.com> wrote:
    >> > Hello everybody,
    >> >
    >> > Thank you for the detailed answers. My issue is partly answered here:
    >> >
    >> >
    >> >
    >> >
    >> > *This rule also applies to disk-level, which means that when a set
    >> > ofpartitions assigned to a specific broker, each of the disks will get
    >> > thesame number of partitions without considering the load of disks at
    >> > thattime.*
    >> >
    >> >  I admit, I didn't provide enough info either.
    >> >
    >> > So my problem is that an existing topic got a huge surge of events for
    >> this
    >> > week. I knew that'll happen and I modified the partition count.
    >> > Unfortunately, it occurred to me a bit later, that I'll likely need
    >> > some
    >> > extra disk space. So I added an extra disk to each broker. The thing I
    >> > didn't know, that Kafka won't evenly distribute the partitions on the
    >> > disks.
    >> > So the question still remains:
    >> >  Is there any way to have Kafka evenly distribute data on its disks?
    >> > Also, what options do I have *after *I'm in the situation I described
    >> > above? (preferably without deleting the topic)
    >> >
    >> > Thanks!
    >> >
    >> > On Fri, Aug 7, 2020 at 12:00 PM Yingshuan Song
    >> > <so...@gmail.com>
    >> > wrote:
    >> >
    >> >> Hi Peter,
    >> >> Agreed with Manoj and Vinicius, i think those rules led to this result
    >> >> :
    >> >>
    >> >> 1)the partitions of a topic - N and replication number - R determine
    >> >> the
    >> >> real partition-replica count of this topic, which is N * R;
    >> >> 2)   kafka can distribute partitions evenly among brokers, but it is
    >> >> based
    >> >> on the broker count when the topic was created, this is important.
    >> >> If we create a topic (N - 4, R - 3) in a kafka cluster which contains
    >> >> 3
    >> >> kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each
    >> >> broker.
    >> >> But if a new broker was added into this cluster and another topic (N -
    >> 4,
    >> >> R
    >> >> - 3) need to be created, then 4 * 3 / 4 = 3 partitions will be
    >> >> assigned
    >> >> to
    >> >> each broker.
    >> >> Kafka will not assign all those partitions to the new added broker
    >> >> even
    >> >> though it is idle and i think this is a shortcoming of kafka.
    >> >> This rule also applies to disk-level, which means that when a set of
    >> >> partitions assigned to a specific broker, each of the disks will get
    >> >> the
    >> >> same number of partitions without considering the load of disks at
    >> >> that
    >> >> time.
    >> >> 3) when producer send records to topics, how to chose partiton : 3-1)
    >> >> if
    >> >> a
    >> >> record has a key, then the partition number calculate according to the
    >> >> key;
    >> >> 3-2) if  records have no keys, then those records will be sent to each
    >> >> partition in turns. So, if there are lots of records with the same
    >> >> key,
    >> >> and
    >> >> those records will be sent to the same partition, and may take up a
    >> >> lot
    >> >> of
    >> >> disk space.
    >> >>
    >> >>
    >> >> hope this helps
    >> >>
    >> >> Vinicius Scheidegger <vi...@gmail.com> 于2020年8月7日周五
    >> >> 上午6:10写道:
    >> >>
    >> >> > Hi Peter,
    >> >> >
    >> >> > AFAIK, everything depends on:
    >> >> >
    >> >> > 1) How you have configured your topic
    >> >> >   a) number of partitions (here I understand you have 15 partitions)
    >> >> >   b) partition replication configuration (each partition necessarily
    >> >> > has
    >> >> a
    >> >> > leader - primary responsible to hold the data - and for reads and
    >> >> > writes)
    >> >> > you can configure the topic to have a number of replicas
    >> >> > 2) How you publish messages to the topic
    >> >> >   a) The publisher is responsible to choose the partition. This can
    >> >> > be
    >> >> done
    >> >> > consciously (by setting the partition id while sending the message
    >> >> > to
    >> >> > the
    >> >> > topic) or unconsciously (by using the DefaultPartitioner or any
    >> >> > other
    >> >> > partitioner scheme).
    >> >> >
    >> >> > All messages sent to a specific partition will be written first to
    >> >> > the
    >> >> > leader (meaning that the disk configured for the partition leader
    >> >> > will
    >> >> > receive the load) and then replicated to the replica (followers).
    >> >> > Kafka does not automatically distribute the data equally to the
    >> >> > different
    >> >> > brokers - you need to think about your architecture having that in
    >> >> > mind.
    >> >> >
    >> >> > I hope it helps
    >> >> >
    >> >> > On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai
    >> >> > <st4r.f1sch@gmail.com
    >> >
    >> >> > wrote:
    >> >> >
    >> >> > > I initially started with one data disk (mounted solely to hold
    >> >> > > Kafka
    >> >> > data)
    >> >> > > and recently added a new one.
    >> >> > >
    >> >> > > On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com>
    >> wrote:
    >> >> > >
    >> >> > > > What do you mean older disk ?
    >> >> > > >
    >> >> > > > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com>
    >> >> wrote:
    >> >> > > >
    >> >> > > >     [External]
    >> >> > > >
    >> >> > > >
    >> >> > > >     Yeah, but it doesn't do that. My "older" disks have ~70
    >> >> partitions,
    >> >> > > the
    >> >> > > >     newer ones ~5 partitions. That's why I'm asking what went
    >> >> > > > wrong.
    >> >> > > >
    >> >> > > >     On Thu, Aug 6, 2020 at 8:35 PM
    >> >> > > > <Ma...@cognizant.com>
    >> >> > wrote:
    >> >> > > >
    >> >> > > >     > Kafka  evenly distributed number of partition on each disk
    >> so
    >> >> in
    >> >> > > > your case
    >> >> > > >     > every disk should have 3/2 topic partitions .
    >> >> > > >     > It is producer job to evenly produce data by partition key
    >> >> > > > to
    >> >> > > topic
    >> >> > > >     > partition .
    >> >> > > >     > How it partition key , it is auto generated or producer
    >> >> > > > sending
    >> >> > key
    >> >> > > > along
    >> >> > > >     > with message .
    >> >> > > >     >
    >> >> > > >     >
    >> >> > > >     > On 8/6/20, 7:29 AM, "Péter Nagykátai"
    >> >> > > > <st4r.f1sch@gmail.com
    >> >
    >> >> > > wrote:
    >> >> > > >     >
    >> >> > > >     >     [External]
    >> >> > > >     >
    >> >> > > >     >
    >> >> > > >     >     Hello,
    >> >> > > >     >
    >> >> > > >     >     I have a Kafka cluster with 3 brokers (v2.3.0) and
    >> >> > > > each
    >> >> > broker
    >> >> > > > has 2
    >> >> > > >     > disks
    >> >> > > >     >     attached. I added a new topic (heavyweight) and was
    >> >> surprised
    >> >> > > > that
    >> >> > > >     > even if
    >> >> > > >     >     the topic has 15 partitions, those weren't distributed
    >> >> evenly
    >> >> > > on
    >> >> > > > the
    >> >> > > >     > disks.
    >> >> > > >     >     Thus I got one disk that's almost empty and the other
    >> >> almost
    >> >> > > > filled
    >> >> > > >     > up. Is
    >> >> > > >     >     there any way to have Kafka evenly distribute data on
    >> its
    >> >> > > disks?
    >> >> > > >     >
    >> >> > > >     >     Thank you!
    >> >> > > >     >
    >> >> > > >     >
    >> >> > > >     > This e-mail and any files transmitted with it are for the
    >> >> > > > sole
    >> >> > use
    >> >> > > > of the
    >> >> > > >     > intended recipient(s) and may contain confidential and
    >> >> privileged
    >> >> > > >     > information. If you are not the intended recipient(s),
    >> please
    >> >> > reply
    >> >> > > > to the
    >> >> > > >     > sender and destroy all copies of the original message. Any
    >> >> > > > unauthorized
    >> >> > > >     > review, use, disclosure, dissemination, forwarding,
    >> >> > > > printing
    >> >> > > > or
    >> >> > > > copying of
    >> >> > > >     > this email, and/or any action taken in reliance on the
    >> >> > > > contents
    >> >> > of
    >> >> > > > this
    >> >> > > >     > e-mail is strictly prohibited and may be unlawful. Where
    >> >> > permitted
    >> >> > > by
    >> >> > > >     > applicable law, this e-mail and other e-mail
    >> >> > > > communications
    >> >> sent
    >> >> > to
    >> >> > > > and
    >> >> > > >     > from Cognizant e-mail addresses may be monitored.
    >> >> > > >     > This e-mail and any files transmitted with it are for the
    >> >> > > > sole
    >> >> > use
    >> >> > > > of the
    >> >> > > >     > intended recipient(s) and may contain confidential and
    >> >> privileged
    >> >> > > >     > information. If you are not the intended recipient(s),
    >> please
    >> >> > reply
    >> >> > > > to the
    >> >> > > >     > sender and destroy all copies of the original message. Any
    >> >> > > > unauthorized
    >> >> > > >     > review, use, disclosure, dissemination, forwarding,
    >> >> > > > printing
    >> >> > > > or
    >> >> > > > copying of
    >> >> > > >     > this email, and/or any action taken in reliance on the
    >> >> > > > contents
    >> >> > of
    >> >> > > > this
    >> >> > > >     > e-mail is strictly prohibited and may be unlawful. Where
    >> >> > permitted
    >> >> > > by
    >> >> > > >     > applicable law, this e-mail and other e-mail
    >> >> > > > communications
    >> >> sent
    >> >> > to
    >> >> > > > and
    >> >> > > >     > from Cognizant e-mail addresses may be monitored.
    >> >> > > >     >
    >> >> > > >
    >> >> > > >
    >> >> > > > This e-mail and any files transmitted with it are for the sole
    >> >> > > > use
    >> >> > > > of
    >> >> > the
    >> >> > > > intended recipient(s) and may contain confidential and
    >> >> > > > privileged
    >> >> > > > information. If you are not the intended recipient(s), please
    >> reply
    >> >> to
    >> >> > > the
    >> >> > > > sender and destroy all copies of the original message. Any
    >> >> unauthorized
    >> >> > > > review, use, disclosure, dissemination, forwarding, printing or
    >> >> copying
    >> >> > > of
    >> >> > > > this email, and/or any action taken in reliance on the contents
    >> >> > > > of
    >> >> this
    >> >> > > > e-mail is strictly prohibited and may be unlawful. Where
    >> >> > > > permitted
    >> >> > > > by
    >> >> > > > applicable law, this e-mail and other e-mail communications sent
    >> to
    >> >> and
    >> >> > > > from Cognizant e-mail addresses may be monitored.
    >> >> > > > This e-mail and any files transmitted with it are for the sole
    >> >> > > > use
    >> >> > > > of
    >> >> > the
    >> >> > > > intended recipient(s) and may contain confidential and
    >> >> > > > privileged
    >> >> > > > information. If you are not the intended recipient(s), please
    >> reply
    >> >> to
    >> >> > > the
    >> >> > > > sender and destroy all copies of the original message. Any
    >> >> unauthorized
    >> >> > > > review, use, disclosure, dissemination, forwarding, printing or
    >> >> copying
    >> >> > > of
    >> >> > > > this email, and/or any action taken in reliance on the contents
    >> >> > > > of
    >> >> this
    >> >> > > > e-mail is strictly prohibited and may be unlawful. Where
    >> >> > > > permitted
    >> >> > > > by
    >> >> > > > applicable law, this e-mail and other e-mail communications sent
    >> to
    >> >> and
    >> >> > > > from Cognizant e-mail addresses may be monitored.
    >> >> > > >
    >> >> > >
    >> >> >
    >> >>
    >> >
    >>
    >>
    >> --
    >>
    >>
    >>
    >> *William Reynolds**Technical Operations Engineer*
    >>
    >>
    >> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2Finstaclustr&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=0AL8%2BrdIfwziuQeP0abcx8Y%2BDZlU5t8eF7xiLgrf%2BUM%3D&amp;reserved=0>
    >> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Finstaclustr&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=nX2Bz67kE9FHnD6P9iHTXBkHzpA4ihAwrTBjtovcxcs%3D&amp;reserved=0>
    >> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Finstaclustr&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=p73vHboFQU07fVK%2BseY%2FHkIiWWKfRmxfkgI7BeFWKOA%3D&amp;reserved=0>
    >>
    >> Read our latest technical blog posts here
    >> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instaclustr.com%2Fblog%2F&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=DP9m1B4qvnBSiXugoGYWB7%2BhPHf5Z0yDmTmo2rVY0DM%3D&amp;reserved=0>.
    >>
    >> This email has been sent on behalf of Instaclustr Pty. Limited
    >> (Australia)
    >> and Instaclustr Inc (USA).
    >>
    >> This email and any attachments may contain confidential and legally
    >> privileged information.  If you are not the intended recipient, do not
    >> copy
    >> or disclose its content, but please reply to this email immediately and
    >> highlight the error to the sender and then immediately delete the
    >> message.
    >>
    >> Instaclustr values your privacy. Our privacy policy can be found at
    >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instaclustr.com%2Fcompany%2Fpolicies%2Fprivacy-policy&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=lYI3LD0Tl8Gk7So%2Fyx485McJ5Wh6X83nBfYGIfut%2Bqo%3D&amp;reserved=0
    >>
    >


    --



    *William Reynolds**Technical Operations Engineer*


    <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2Finstaclustr&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=0AL8%2BrdIfwziuQeP0abcx8Y%2BDZlU5t8eF7xiLgrf%2BUM%3D&amp;reserved=0>   <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Finstaclustr&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=nX2Bz67kE9FHnD6P9iHTXBkHzpA4ihAwrTBjtovcxcs%3D&amp;reserved=0>
    <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Finstaclustr&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=p73vHboFQU07fVK%2BseY%2FHkIiWWKfRmxfkgI7BeFWKOA%3D&amp;reserved=0>

    Read our latest technical blog posts here
    <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instaclustr.com%2Fblog%2F&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=DP9m1B4qvnBSiXugoGYWB7%2BhPHf5Z0yDmTmo2rVY0DM%3D&amp;reserved=0>.

    This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
    and Instaclustr Inc (USA).

    This email and any attachments may contain confidential and legally
    privileged information.  If you are not the intended recipient, do not copy
    or disclose its content, but please reply to this email immediately and
    highlight the error to the sender and then immediately delete the message.

    Instaclustr values your privacy. Our privacy policy can be found at
    https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instaclustr.com%2Fcompany%2Fpolicies%2Fprivacy-policy&amp;data=02%7C01%7CManoj.Agrawal2%40cognizant.com%7C3c313758d6c44da817ac08d83ad8262f%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637324047321162468&amp;sdata=lYI3LD0Tl8Gk7So%2Fyx485McJ5Wh6X83nBfYGIfut%2Bqo%3D&amp;reserved=0


This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Kafka topic partition distributing evenly on disks

Posted by William Reynolds <wi...@instaclustr.com>.
Hmm, that's odd, I am sure it was in the docs previously. Here is the
KIP on it https://cwiki.apache.org/confluence/display/KAFKA/KIP-113%3A+Support+replicas+movement+between+log+directories
Basically the reassignment json that you get looks like this from the
initial generation and if you already have a realignment file you can
just add the log dirs section to each partition entry

{
  "version" : int,
  "partitions" : [
    {
      "topic" : str,
      "partition" : int,
      "replicas" : [int],
      "log_dirs" : [str]    <-- NEW. A log directory can be either
"any", or a valid absolute path that begins with '/'. This is an
optional filed. It is treated as an array of "any" if this field is
not explicitly specified in the json file.
    },
    ...
  ]
}

Hope that helps
William

On 07/08/2020, Péter Nagykátai <st...@gmail.com> wrote:
> Thank you William,
>
> I checked the doc and don't see any instructions regarding disks. Should I
> simply "move around" the topics and Kafka will assign the topics evenly on
> the two disks (per broker)? The current setup looks like this (for the
> topic in question, 15 primary, replica partitions):
>
> Broker 1 - disk 1: 8 partition
> Broker 1 - disk 2: 2 partition
>
> Broker 2 - disk 1: 8 partition
> Broker 2 - disk 2: 2 partition
>
> Broker 3 - disk 1: 8 partition
> Broker 3 - disk 2: 2 partition
>
> Thanks!
>
> On Fri, Aug 7, 2020 at 1:01 PM William Reynolds <
> william.reynolds@instaclustr.com> wrote:
>
>> Hi Péter,
>> Sounds like time to reassign the partitions you have across all the
>> brokers/data dirs using the instructions from here
>> https://kafka.apache.org/documentation/#basic_ops_automigrate. That
>> assumes that your partition strategy has somewhat evenly filled your
>> partitions and given it may move all the partitions it could be a bit
>> intensive so be sure to use the throttle option.
>> Cheers
>> William
>>
>> On 07/08/2020, Péter Nagykátai <st...@gmail.com> wrote:
>> > Hello everybody,
>> >
>> > Thank you for the detailed answers. My issue is partly answered here:
>> >
>> >
>> >
>> >
>> > *This rule also applies to disk-level, which means that when a set
>> > ofpartitions assigned to a specific broker, each of the disks will get
>> > thesame number of partitions without considering the load of disks at
>> > thattime.*
>> >
>> >  I admit, I didn't provide enough info either.
>> >
>> > So my problem is that an existing topic got a huge surge of events for
>> this
>> > week. I knew that'll happen and I modified the partition count.
>> > Unfortunately, it occurred to me a bit later, that I'll likely need
>> > some
>> > extra disk space. So I added an extra disk to each broker. The thing I
>> > didn't know, that Kafka won't evenly distribute the partitions on the
>> > disks.
>> > So the question still remains:
>> >  Is there any way to have Kafka evenly distribute data on its disks?
>> > Also, what options do I have *after *I'm in the situation I described
>> > above? (preferably without deleting the topic)
>> >
>> > Thanks!
>> >
>> > On Fri, Aug 7, 2020 at 12:00 PM Yingshuan Song
>> > <so...@gmail.com>
>> > wrote:
>> >
>> >> Hi Peter,
>> >> Agreed with Manoj and Vinicius, i think those rules led to this result
>> >> :
>> >>
>> >> 1)the partitions of a topic - N and replication number - R determine
>> >> the
>> >> real partition-replica count of this topic, which is N * R;
>> >> 2)   kafka can distribute partitions evenly among brokers, but it is
>> >> based
>> >> on the broker count when the topic was created, this is important.
>> >> If we create a topic (N - 4, R - 3) in a kafka cluster which contains
>> >> 3
>> >> kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each
>> >> broker.
>> >> But if a new broker was added into this cluster and another topic (N -
>> 4,
>> >> R
>> >> - 3) need to be created, then 4 * 3 / 4 = 3 partitions will be
>> >> assigned
>> >> to
>> >> each broker.
>> >> Kafka will not assign all those partitions to the new added broker
>> >> even
>> >> though it is idle and i think this is a shortcoming of kafka.
>> >> This rule also applies to disk-level, which means that when a set of
>> >> partitions assigned to a specific broker, each of the disks will get
>> >> the
>> >> same number of partitions without considering the load of disks at
>> >> that
>> >> time.
>> >> 3) when producer send records to topics, how to chose partiton : 3-1)
>> >> if
>> >> a
>> >> record has a key, then the partition number calculate according to the
>> >> key;
>> >> 3-2) if  records have no keys, then those records will be sent to each
>> >> partition in turns. So, if there are lots of records with the same
>> >> key,
>> >> and
>> >> those records will be sent to the same partition, and may take up a
>> >> lot
>> >> of
>> >> disk space.
>> >>
>> >>
>> >> hope this helps
>> >>
>> >> Vinicius Scheidegger <vi...@gmail.com> 于2020年8月7日周五
>> >> 上午6:10写道:
>> >>
>> >> > Hi Peter,
>> >> >
>> >> > AFAIK, everything depends on:
>> >> >
>> >> > 1) How you have configured your topic
>> >> >   a) number of partitions (here I understand you have 15 partitions)
>> >> >   b) partition replication configuration (each partition necessarily
>> >> > has
>> >> a
>> >> > leader - primary responsible to hold the data - and for reads and
>> >> > writes)
>> >> > you can configure the topic to have a number of replicas
>> >> > 2) How you publish messages to the topic
>> >> >   a) The publisher is responsible to choose the partition. This can
>> >> > be
>> >> done
>> >> > consciously (by setting the partition id while sending the message
>> >> > to
>> >> > the
>> >> > topic) or unconsciously (by using the DefaultPartitioner or any
>> >> > other
>> >> > partitioner scheme).
>> >> >
>> >> > All messages sent to a specific partition will be written first to
>> >> > the
>> >> > leader (meaning that the disk configured for the partition leader
>> >> > will
>> >> > receive the load) and then replicated to the replica (followers).
>> >> > Kafka does not automatically distribute the data equally to the
>> >> > different
>> >> > brokers - you need to think about your architecture having that in
>> >> > mind.
>> >> >
>> >> > I hope it helps
>> >> >
>> >> > On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai
>> >> > <st4r.f1sch@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> > > I initially started with one data disk (mounted solely to hold
>> >> > > Kafka
>> >> > data)
>> >> > > and recently added a new one.
>> >> > >
>> >> > > On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com>
>> wrote:
>> >> > >
>> >> > > > What do you mean older disk ?
>> >> > > >
>> >> > > > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com>
>> >> wrote:
>> >> > > >
>> >> > > >     [External]
>> >> > > >
>> >> > > >
>> >> > > >     Yeah, but it doesn't do that. My "older" disks have ~70
>> >> partitions,
>> >> > > the
>> >> > > >     newer ones ~5 partitions. That's why I'm asking what went
>> >> > > > wrong.
>> >> > > >
>> >> > > >     On Thu, Aug 6, 2020 at 8:35 PM
>> >> > > > <Ma...@cognizant.com>
>> >> > wrote:
>> >> > > >
>> >> > > >     > Kafka  evenly distributed number of partition on each disk
>> so
>> >> in
>> >> > > > your case
>> >> > > >     > every disk should have 3/2 topic partitions .
>> >> > > >     > It is producer job to evenly produce data by partition key
>> >> > > > to
>> >> > > topic
>> >> > > >     > partition .
>> >> > > >     > How it partition key , it is auto generated or producer
>> >> > > > sending
>> >> > key
>> >> > > > along
>> >> > > >     > with message .
>> >> > > >     >
>> >> > > >     >
>> >> > > >     > On 8/6/20, 7:29 AM, "Péter Nagykátai"
>> >> > > > <st4r.f1sch@gmail.com
>> >
>> >> > > wrote:
>> >> > > >     >
>> >> > > >     >     [External]
>> >> > > >     >
>> >> > > >     >
>> >> > > >     >     Hello,
>> >> > > >     >
>> >> > > >     >     I have a Kafka cluster with 3 brokers (v2.3.0) and
>> >> > > > each
>> >> > broker
>> >> > > > has 2
>> >> > > >     > disks
>> >> > > >     >     attached. I added a new topic (heavyweight) and was
>> >> surprised
>> >> > > > that
>> >> > > >     > even if
>> >> > > >     >     the topic has 15 partitions, those weren't distributed
>> >> evenly
>> >> > > on
>> >> > > > the
>> >> > > >     > disks.
>> >> > > >     >     Thus I got one disk that's almost empty and the other
>> >> almost
>> >> > > > filled
>> >> > > >     > up. Is
>> >> > > >     >     there any way to have Kafka evenly distribute data on
>> its
>> >> > > disks?
>> >> > > >     >
>> >> > > >     >     Thank you!
>> >> > > >     >
>> >> > > >     >
>> >> > > >     > This e-mail and any files transmitted with it are for the
>> >> > > > sole
>> >> > use
>> >> > > > of the
>> >> > > >     > intended recipient(s) and may contain confidential and
>> >> privileged
>> >> > > >     > information. If you are not the intended recipient(s),
>> please
>> >> > reply
>> >> > > > to the
>> >> > > >     > sender and destroy all copies of the original message. Any
>> >> > > > unauthorized
>> >> > > >     > review, use, disclosure, dissemination, forwarding,
>> >> > > > printing
>> >> > > > or
>> >> > > > copying of
>> >> > > >     > this email, and/or any action taken in reliance on the
>> >> > > > contents
>> >> > of
>> >> > > > this
>> >> > > >     > e-mail is strictly prohibited and may be unlawful. Where
>> >> > permitted
>> >> > > by
>> >> > > >     > applicable law, this e-mail and other e-mail
>> >> > > > communications
>> >> sent
>> >> > to
>> >> > > > and
>> >> > > >     > from Cognizant e-mail addresses may be monitored.
>> >> > > >     > This e-mail and any files transmitted with it are for the
>> >> > > > sole
>> >> > use
>> >> > > > of the
>> >> > > >     > intended recipient(s) and may contain confidential and
>> >> privileged
>> >> > > >     > information. If you are not the intended recipient(s),
>> please
>> >> > reply
>> >> > > > to the
>> >> > > >     > sender and destroy all copies of the original message. Any
>> >> > > > unauthorized
>> >> > > >     > review, use, disclosure, dissemination, forwarding,
>> >> > > > printing
>> >> > > > or
>> >> > > > copying of
>> >> > > >     > this email, and/or any action taken in reliance on the
>> >> > > > contents
>> >> > of
>> >> > > > this
>> >> > > >     > e-mail is strictly prohibited and may be unlawful. Where
>> >> > permitted
>> >> > > by
>> >> > > >     > applicable law, this e-mail and other e-mail
>> >> > > > communications
>> >> sent
>> >> > to
>> >> > > > and
>> >> > > >     > from Cognizant e-mail addresses may be monitored.
>> >> > > >     >
>> >> > > >
>> >> > > >
>> >> > > > This e-mail and any files transmitted with it are for the sole
>> >> > > > use
>> >> > > > of
>> >> > the
>> >> > > > intended recipient(s) and may contain confidential and
>> >> > > > privileged
>> >> > > > information. If you are not the intended recipient(s), please
>> reply
>> >> to
>> >> > > the
>> >> > > > sender and destroy all copies of the original message. Any
>> >> unauthorized
>> >> > > > review, use, disclosure, dissemination, forwarding, printing or
>> >> copying
>> >> > > of
>> >> > > > this email, and/or any action taken in reliance on the contents
>> >> > > > of
>> >> this
>> >> > > > e-mail is strictly prohibited and may be unlawful. Where
>> >> > > > permitted
>> >> > > > by
>> >> > > > applicable law, this e-mail and other e-mail communications sent
>> to
>> >> and
>> >> > > > from Cognizant e-mail addresses may be monitored.
>> >> > > > This e-mail and any files transmitted with it are for the sole
>> >> > > > use
>> >> > > > of
>> >> > the
>> >> > > > intended recipient(s) and may contain confidential and
>> >> > > > privileged
>> >> > > > information. If you are not the intended recipient(s), please
>> reply
>> >> to
>> >> > > the
>> >> > > > sender and destroy all copies of the original message. Any
>> >> unauthorized
>> >> > > > review, use, disclosure, dissemination, forwarding, printing or
>> >> copying
>> >> > > of
>> >> > > > this email, and/or any action taken in reliance on the contents
>> >> > > > of
>> >> this
>> >> > > > e-mail is strictly prohibited and may be unlawful. Where
>> >> > > > permitted
>> >> > > > by
>> >> > > > applicable law, this e-mail and other e-mail communications sent
>> to
>> >> and
>> >> > > > from Cognizant e-mail addresses may be monitored.
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>>
>> --
>>
>>
>>
>> *William Reynolds**Technical Operations Engineer*
>>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia)
>> and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not
>> copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the
>> message.
>>
>> Instaclustr values your privacy. Our privacy policy can be found at
>> https://www.instaclustr.com/company/policies/privacy-policy
>>
>


-- 



*William Reynolds**Technical Operations Engineer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy

Re: Kafka topic partition distributing evenly on disks

Posted by Péter Nagykátai <st...@gmail.com>.
Thank you William,

I checked the doc and don't see any instructions regarding disks. Should I
simply "move around" the topics and Kafka will assign the topics evenly on
the two disks (per broker)? The current setup looks like this (for the
topic in question, 15 primary, replica partitions):

Broker 1 - disk 1: 8 partition
Broker 1 - disk 2: 2 partition

Broker 2 - disk 1: 8 partition
Broker 2 - disk 2: 2 partition

Broker 3 - disk 1: 8 partition
Broker 3 - disk 2: 2 partition

Thanks!

On Fri, Aug 7, 2020 at 1:01 PM William Reynolds <
william.reynolds@instaclustr.com> wrote:

> Hi Péter,
> Sounds like time to reassign the partitions you have across all the
> brokers/data dirs using the instructions from here
> https://kafka.apache.org/documentation/#basic_ops_automigrate. That
> assumes that your partition strategy has somewhat evenly filled your
> partitions and given it may move all the partitions it could be a bit
> intensive so be sure to use the throttle option.
> Cheers
> William
>
> On 07/08/2020, Péter Nagykátai <st...@gmail.com> wrote:
> > Hello everybody,
> >
> > Thank you for the detailed answers. My issue is partly answered here:
> >
> >
> >
> >
> > *This rule also applies to disk-level, which means that when a set
> > ofpartitions assigned to a specific broker, each of the disks will get
> > thesame number of partitions without considering the load of disks at
> > thattime.*
> >
> >  I admit, I didn't provide enough info either.
> >
> > So my problem is that an existing topic got a huge surge of events for
> this
> > week. I knew that'll happen and I modified the partition count.
> > Unfortunately, it occurred to me a bit later, that I'll likely need some
> > extra disk space. So I added an extra disk to each broker. The thing I
> > didn't know, that Kafka won't evenly distribute the partitions on the
> > disks.
> > So the question still remains:
> >  Is there any way to have Kafka evenly distribute data on its disks?
> > Also, what options do I have *after *I'm in the situation I described
> > above? (preferably without deleting the topic)
> >
> > Thanks!
> >
> > On Fri, Aug 7, 2020 at 12:00 PM Yingshuan Song <so...@gmail.com>
> > wrote:
> >
> >> Hi Peter,
> >> Agreed with Manoj and Vinicius, i think those rules led to this result :
> >>
> >> 1)the partitions of a topic - N and replication number - R determine the
> >> real partition-replica count of this topic, which is N * R;
> >> 2)   kafka can distribute partitions evenly among brokers, but it is
> >> based
> >> on the broker count when the topic was created, this is important.
> >> If we create a topic (N - 4, R - 3) in a kafka cluster which contains 3
> >> kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each
> >> broker.
> >> But if a new broker was added into this cluster and another topic (N -
> 4,
> >> R
> >> - 3) need to be created, then 4 * 3 / 4 = 3 partitions will be assigned
> >> to
> >> each broker.
> >> Kafka will not assign all those partitions to the new added broker even
> >> though it is idle and i think this is a shortcoming of kafka.
> >> This rule also applies to disk-level, which means that when a set of
> >> partitions assigned to a specific broker, each of the disks will get the
> >> same number of partitions without considering the load of disks at that
> >> time.
> >> 3) when producer send records to topics, how to chose partiton : 3-1) if
> >> a
> >> record has a key, then the partition number calculate according to the
> >> key;
> >> 3-2) if  records have no keys, then those records will be sent to each
> >> partition in turns. So, if there are lots of records with the same key,
> >> and
> >> those records will be sent to the same partition, and may take up a lot
> >> of
> >> disk space.
> >>
> >>
> >> hope this helps
> >>
> >> Vinicius Scheidegger <vi...@gmail.com> 于2020年8月7日周五
> >> 上午6:10写道:
> >>
> >> > Hi Peter,
> >> >
> >> > AFAIK, everything depends on:
> >> >
> >> > 1) How you have configured your topic
> >> >   a) number of partitions (here I understand you have 15 partitions)
> >> >   b) partition replication configuration (each partition necessarily
> >> > has
> >> a
> >> > leader - primary responsible to hold the data - and for reads and
> >> > writes)
> >> > you can configure the topic to have a number of replicas
> >> > 2) How you publish messages to the topic
> >> >   a) The publisher is responsible to choose the partition. This can be
> >> done
> >> > consciously (by setting the partition id while sending the message to
> >> > the
> >> > topic) or unconsciously (by using the DefaultPartitioner or any other
> >> > partitioner scheme).
> >> >
> >> > All messages sent to a specific partition will be written first to the
> >> > leader (meaning that the disk configured for the partition leader will
> >> > receive the load) and then replicated to the replica (followers).
> >> > Kafka does not automatically distribute the data equally to the
> >> > different
> >> > brokers - you need to think about your architecture having that in
> >> > mind.
> >> >
> >> > I hope it helps
> >> >
> >> > On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai <st4r.f1sch@gmail.com
> >
> >> > wrote:
> >> >
> >> > > I initially started with one data disk (mounted solely to hold Kafka
> >> > data)
> >> > > and recently added a new one.
> >> > >
> >> > > On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com>
> wrote:
> >> > >
> >> > > > What do you mean older disk ?
> >> > > >
> >> > > > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com>
> >> wrote:
> >> > > >
> >> > > >     [External]
> >> > > >
> >> > > >
> >> > > >     Yeah, but it doesn't do that. My "older" disks have ~70
> >> partitions,
> >> > > the
> >> > > >     newer ones ~5 partitions. That's why I'm asking what went
> >> > > > wrong.
> >> > > >
> >> > > >     On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com>
> >> > wrote:
> >> > > >
> >> > > >     > Kafka  evenly distributed number of partition on each disk
> so
> >> in
> >> > > > your case
> >> > > >     > every disk should have 3/2 topic partitions .
> >> > > >     > It is producer job to evenly produce data by partition key
> >> > > > to
> >> > > topic
> >> > > >     > partition .
> >> > > >     > How it partition key , it is auto generated or producer
> >> > > > sending
> >> > key
> >> > > > along
> >> > > >     > with message .
> >> > > >     >
> >> > > >     >
> >> > > >     > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st4r.f1sch@gmail.com
> >
> >> > > wrote:
> >> > > >     >
> >> > > >     >     [External]
> >> > > >     >
> >> > > >     >
> >> > > >     >     Hello,
> >> > > >     >
> >> > > >     >     I have a Kafka cluster with 3 brokers (v2.3.0) and each
> >> > broker
> >> > > > has 2
> >> > > >     > disks
> >> > > >     >     attached. I added a new topic (heavyweight) and was
> >> surprised
> >> > > > that
> >> > > >     > even if
> >> > > >     >     the topic has 15 partitions, those weren't distributed
> >> evenly
> >> > > on
> >> > > > the
> >> > > >     > disks.
> >> > > >     >     Thus I got one disk that's almost empty and the other
> >> almost
> >> > > > filled
> >> > > >     > up. Is
> >> > > >     >     there any way to have Kafka evenly distribute data on
> its
> >> > > disks?
> >> > > >     >
> >> > > >     >     Thank you!
> >> > > >     >
> >> > > >     >
> >> > > >     > This e-mail and any files transmitted with it are for the
> >> > > > sole
> >> > use
> >> > > > of the
> >> > > >     > intended recipient(s) and may contain confidential and
> >> privileged
> >> > > >     > information. If you are not the intended recipient(s),
> please
> >> > reply
> >> > > > to the
> >> > > >     > sender and destroy all copies of the original message. Any
> >> > > > unauthorized
> >> > > >     > review, use, disclosure, dissemination, forwarding, printing
> >> > > > or
> >> > > > copying of
> >> > > >     > this email, and/or any action taken in reliance on the
> >> > > > contents
> >> > of
> >> > > > this
> >> > > >     > e-mail is strictly prohibited and may be unlawful. Where
> >> > permitted
> >> > > by
> >> > > >     > applicable law, this e-mail and other e-mail communications
> >> sent
> >> > to
> >> > > > and
> >> > > >     > from Cognizant e-mail addresses may be monitored.
> >> > > >     > This e-mail and any files transmitted with it are for the
> >> > > > sole
> >> > use
> >> > > > of the
> >> > > >     > intended recipient(s) and may contain confidential and
> >> privileged
> >> > > >     > information. If you are not the intended recipient(s),
> please
> >> > reply
> >> > > > to the
> >> > > >     > sender and destroy all copies of the original message. Any
> >> > > > unauthorized
> >> > > >     > review, use, disclosure, dissemination, forwarding, printing
> >> > > > or
> >> > > > copying of
> >> > > >     > this email, and/or any action taken in reliance on the
> >> > > > contents
> >> > of
> >> > > > this
> >> > > >     > e-mail is strictly prohibited and may be unlawful. Where
> >> > permitted
> >> > > by
> >> > > >     > applicable law, this e-mail and other e-mail communications
> >> sent
> >> > to
> >> > > > and
> >> > > >     > from Cognizant e-mail addresses may be monitored.
> >> > > >     >
> >> > > >
> >> > > >
> >> > > > This e-mail and any files transmitted with it are for the sole use
> >> > > > of
> >> > the
> >> > > > intended recipient(s) and may contain confidential and privileged
> >> > > > information. If you are not the intended recipient(s), please
> reply
> >> to
> >> > > the
> >> > > > sender and destroy all copies of the original message. Any
> >> unauthorized
> >> > > > review, use, disclosure, dissemination, forwarding, printing or
> >> copying
> >> > > of
> >> > > > this email, and/or any action taken in reliance on the contents of
> >> this
> >> > > > e-mail is strictly prohibited and may be unlawful. Where permitted
> >> > > > by
> >> > > > applicable law, this e-mail and other e-mail communications sent
> to
> >> and
> >> > > > from Cognizant e-mail addresses may be monitored.
> >> > > > This e-mail and any files transmitted with it are for the sole use
> >> > > > of
> >> > the
> >> > > > intended recipient(s) and may contain confidential and privileged
> >> > > > information. If you are not the intended recipient(s), please
> reply
> >> to
> >> > > the
> >> > > > sender and destroy all copies of the original message. Any
> >> unauthorized
> >> > > > review, use, disclosure, dissemination, forwarding, printing or
> >> copying
> >> > > of
> >> > > > this email, and/or any action taken in reliance on the contents of
> >> this
> >> > > > e-mail is strictly prohibited and may be unlawful. Where permitted
> >> > > > by
> >> > > > applicable law, this e-mail and other e-mail communications sent
> to
> >> and
> >> > > > from Cognizant e-mail addresses may be monitored.
> >> > > >
> >> > >
> >> >
> >>
> >
>
>
> --
>
>
>
> *William Reynolds**Technical Operations Engineer*
>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> Instaclustr values your privacy. Our privacy policy can be found at
> https://www.instaclustr.com/company/policies/privacy-policy
>

Re: Kafka topic partition distributing evenly on disks

Posted by William Reynolds <wi...@instaclustr.com>.
Hi Péter,
Sounds like time to reassign the partitions you have across all the
brokers/data dirs using the instructions from here
https://kafka.apache.org/documentation/#basic_ops_automigrate. That
assumes that your partition strategy has somewhat evenly filled your
partitions and given it may move all the partitions it could be a bit
intensive so be sure to use the throttle option.
Cheers
William

On 07/08/2020, Péter Nagykátai <st...@gmail.com> wrote:
> Hello everybody,
>
> Thank you for the detailed answers. My issue is partly answered here:
>
>
>
>
> *This rule also applies to disk-level, which means that when a set
> ofpartitions assigned to a specific broker, each of the disks will get
> thesame number of partitions without considering the load of disks at
> thattime.*
>
>  I admit, I didn't provide enough info either.
>
> So my problem is that an existing topic got a huge surge of events for this
> week. I knew that'll happen and I modified the partition count.
> Unfortunately, it occurred to me a bit later, that I'll likely need some
> extra disk space. So I added an extra disk to each broker. The thing I
> didn't know, that Kafka won't evenly distribute the partitions on the
> disks.
> So the question still remains:
>  Is there any way to have Kafka evenly distribute data on its disks?
> Also, what options do I have *after *I'm in the situation I described
> above? (preferably without deleting the topic)
>
> Thanks!
>
> On Fri, Aug 7, 2020 at 12:00 PM Yingshuan Song <so...@gmail.com>
> wrote:
>
>> Hi Peter,
>> Agreed with Manoj and Vinicius, i think those rules led to this result :
>>
>> 1)the partitions of a topic - N and replication number - R determine the
>> real partition-replica count of this topic, which is N * R;
>> 2)   kafka can distribute partitions evenly among brokers, but it is
>> based
>> on the broker count when the topic was created, this is important.
>> If we create a topic (N - 4, R - 3) in a kafka cluster which contains 3
>> kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each
>> broker.
>> But if a new broker was added into this cluster and another topic (N - 4,
>> R
>> - 3) need to be created, then 4 * 3 / 4 = 3 partitions will be assigned
>> to
>> each broker.
>> Kafka will not assign all those partitions to the new added broker even
>> though it is idle and i think this is a shortcoming of kafka.
>> This rule also applies to disk-level, which means that when a set of
>> partitions assigned to a specific broker, each of the disks will get the
>> same number of partitions without considering the load of disks at that
>> time.
>> 3) when producer send records to topics, how to chose partiton : 3-1) if
>> a
>> record has a key, then the partition number calculate according to the
>> key;
>> 3-2) if  records have no keys, then those records will be sent to each
>> partition in turns. So, if there are lots of records with the same key,
>> and
>> those records will be sent to the same partition, and may take up a lot
>> of
>> disk space.
>>
>>
>> hope this helps
>>
>> Vinicius Scheidegger <vi...@gmail.com> 于2020年8月7日周五
>> 上午6:10写道:
>>
>> > Hi Peter,
>> >
>> > AFAIK, everything depends on:
>> >
>> > 1) How you have configured your topic
>> >   a) number of partitions (here I understand you have 15 partitions)
>> >   b) partition replication configuration (each partition necessarily
>> > has
>> a
>> > leader - primary responsible to hold the data - and for reads and
>> > writes)
>> > you can configure the topic to have a number of replicas
>> > 2) How you publish messages to the topic
>> >   a) The publisher is responsible to choose the partition. This can be
>> done
>> > consciously (by setting the partition id while sending the message to
>> > the
>> > topic) or unconsciously (by using the DefaultPartitioner or any other
>> > partitioner scheme).
>> >
>> > All messages sent to a specific partition will be written first to the
>> > leader (meaning that the disk configured for the partition leader will
>> > receive the load) and then replicated to the replica (followers).
>> > Kafka does not automatically distribute the data equally to the
>> > different
>> > brokers - you need to think about your architecture having that in
>> > mind.
>> >
>> > I hope it helps
>> >
>> > On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai <st...@gmail.com>
>> > wrote:
>> >
>> > > I initially started with one data disk (mounted solely to hold Kafka
>> > data)
>> > > and recently added a new one.
>> > >
>> > > On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com> wrote:
>> > >
>> > > > What do you mean older disk ?
>> > > >
>> > > > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com>
>> wrote:
>> > > >
>> > > >     [External]
>> > > >
>> > > >
>> > > >     Yeah, but it doesn't do that. My "older" disks have ~70
>> partitions,
>> > > the
>> > > >     newer ones ~5 partitions. That's why I'm asking what went
>> > > > wrong.
>> > > >
>> > > >     On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com>
>> > wrote:
>> > > >
>> > > >     > Kafka  evenly distributed number of partition on each disk so
>> in
>> > > > your case
>> > > >     > every disk should have 3/2 topic partitions .
>> > > >     > It is producer job to evenly produce data by partition key
>> > > > to
>> > > topic
>> > > >     > partition .
>> > > >     > How it partition key , it is auto generated or producer
>> > > > sending
>> > key
>> > > > along
>> > > >     > with message .
>> > > >     >
>> > > >     >
>> > > >     > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com>
>> > > wrote:
>> > > >     >
>> > > >     >     [External]
>> > > >     >
>> > > >     >
>> > > >     >     Hello,
>> > > >     >
>> > > >     >     I have a Kafka cluster with 3 brokers (v2.3.0) and each
>> > broker
>> > > > has 2
>> > > >     > disks
>> > > >     >     attached. I added a new topic (heavyweight) and was
>> surprised
>> > > > that
>> > > >     > even if
>> > > >     >     the topic has 15 partitions, those weren't distributed
>> evenly
>> > > on
>> > > > the
>> > > >     > disks.
>> > > >     >     Thus I got one disk that's almost empty and the other
>> almost
>> > > > filled
>> > > >     > up. Is
>> > > >     >     there any way to have Kafka evenly distribute data on its
>> > > disks?
>> > > >     >
>> > > >     >     Thank you!
>> > > >     >
>> > > >     >
>> > > >     > This e-mail and any files transmitted with it are for the
>> > > > sole
>> > use
>> > > > of the
>> > > >     > intended recipient(s) and may contain confidential and
>> privileged
>> > > >     > information. If you are not the intended recipient(s), please
>> > reply
>> > > > to the
>> > > >     > sender and destroy all copies of the original message. Any
>> > > > unauthorized
>> > > >     > review, use, disclosure, dissemination, forwarding, printing
>> > > > or
>> > > > copying of
>> > > >     > this email, and/or any action taken in reliance on the
>> > > > contents
>> > of
>> > > > this
>> > > >     > e-mail is strictly prohibited and may be unlawful. Where
>> > permitted
>> > > by
>> > > >     > applicable law, this e-mail and other e-mail communications
>> sent
>> > to
>> > > > and
>> > > >     > from Cognizant e-mail addresses may be monitored.
>> > > >     > This e-mail and any files transmitted with it are for the
>> > > > sole
>> > use
>> > > > of the
>> > > >     > intended recipient(s) and may contain confidential and
>> privileged
>> > > >     > information. If you are not the intended recipient(s), please
>> > reply
>> > > > to the
>> > > >     > sender and destroy all copies of the original message. Any
>> > > > unauthorized
>> > > >     > review, use, disclosure, dissemination, forwarding, printing
>> > > > or
>> > > > copying of
>> > > >     > this email, and/or any action taken in reliance on the
>> > > > contents
>> > of
>> > > > this
>> > > >     > e-mail is strictly prohibited and may be unlawful. Where
>> > permitted
>> > > by
>> > > >     > applicable law, this e-mail and other e-mail communications
>> sent
>> > to
>> > > > and
>> > > >     > from Cognizant e-mail addresses may be monitored.
>> > > >     >
>> > > >
>> > > >
>> > > > This e-mail and any files transmitted with it are for the sole use
>> > > > of
>> > the
>> > > > intended recipient(s) and may contain confidential and privileged
>> > > > information. If you are not the intended recipient(s), please reply
>> to
>> > > the
>> > > > sender and destroy all copies of the original message. Any
>> unauthorized
>> > > > review, use, disclosure, dissemination, forwarding, printing or
>> copying
>> > > of
>> > > > this email, and/or any action taken in reliance on the contents of
>> this
>> > > > e-mail is strictly prohibited and may be unlawful. Where permitted
>> > > > by
>> > > > applicable law, this e-mail and other e-mail communications sent to
>> and
>> > > > from Cognizant e-mail addresses may be monitored.
>> > > > This e-mail and any files transmitted with it are for the sole use
>> > > > of
>> > the
>> > > > intended recipient(s) and may contain confidential and privileged
>> > > > information. If you are not the intended recipient(s), please reply
>> to
>> > > the
>> > > > sender and destroy all copies of the original message. Any
>> unauthorized
>> > > > review, use, disclosure, dissemination, forwarding, printing or
>> copying
>> > > of
>> > > > this email, and/or any action taken in reliance on the contents of
>> this
>> > > > e-mail is strictly prohibited and may be unlawful. Where permitted
>> > > > by
>> > > > applicable law, this e-mail and other e-mail communications sent to
>> and
>> > > > from Cognizant e-mail addresses may be monitored.
>> > > >
>> > >
>> >
>>
>


-- 



*William Reynolds**Technical Operations Engineer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Instaclustr values your privacy. Our privacy policy can be found at
https://www.instaclustr.com/company/policies/privacy-policy

Re: Kafka topic partition distributing evenly on disks

Posted by Péter Nagykátai <st...@gmail.com>.
Hello everybody,

Thank you for the detailed answers. My issue is partly answered here:




*This rule also applies to disk-level, which means that when a set
ofpartitions assigned to a specific broker, each of the disks will get
thesame number of partitions without considering the load of disks at
thattime.*

 I admit, I didn't provide enough info either.

So my problem is that an existing topic got a huge surge of events for this
week. I knew that'll happen and I modified the partition count.
Unfortunately, it occurred to me a bit later, that I'll likely need some
extra disk space. So I added an extra disk to each broker. The thing I
didn't know, that Kafka won't evenly distribute the partitions on the disks.
So the question still remains:
 Is there any way to have Kafka evenly distribute data on its disks?
Also, what options do I have *after *I'm in the situation I described
above? (preferably without deleting the topic)

Thanks!

On Fri, Aug 7, 2020 at 12:00 PM Yingshuan Song <so...@gmail.com>
wrote:

> Hi Peter,
> Agreed with Manoj and Vinicius, i think those rules led to this result :
>
> 1)the partitions of a topic - N and replication number - R determine the
> real partition-replica count of this topic, which is N * R;
> 2)   kafka can distribute partitions evenly among brokers, but it is based
> on the broker count when the topic was created, this is important.
> If we create a topic (N - 4, R - 3) in a kafka cluster which contains 3
> kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each
> broker.
> But if a new broker was added into this cluster and another topic (N - 4, R
> - 3) need to be created, then 4 * 3 / 4 = 3 partitions will be assigned to
> each broker.
> Kafka will not assign all those partitions to the new added broker even
> though it is idle and i think this is a shortcoming of kafka.
> This rule also applies to disk-level, which means that when a set of
> partitions assigned to a specific broker, each of the disks will get the
> same number of partitions without considering the load of disks at that
> time.
> 3) when producer send records to topics, how to chose partiton : 3-1) if a
> record has a key, then the partition number calculate according to the key;
> 3-2) if  records have no keys, then those records will be sent to each
> partition in turns. So, if there are lots of records with the same key, and
> those records will be sent to the same partition, and may take up a lot of
> disk space.
>
>
> hope this helps
>
> Vinicius Scheidegger <vi...@gmail.com> 于2020年8月7日周五
> 上午6:10写道:
>
> > Hi Peter,
> >
> > AFAIK, everything depends on:
> >
> > 1) How you have configured your topic
> >   a) number of partitions (here I understand you have 15 partitions)
> >   b) partition replication configuration (each partition necessarily has
> a
> > leader - primary responsible to hold the data - and for reads and writes)
> > you can configure the topic to have a number of replicas
> > 2) How you publish messages to the topic
> >   a) The publisher is responsible to choose the partition. This can be
> done
> > consciously (by setting the partition id while sending the message to the
> > topic) or unconsciously (by using the DefaultPartitioner or any other
> > partitioner scheme).
> >
> > All messages sent to a specific partition will be written first to the
> > leader (meaning that the disk configured for the partition leader will
> > receive the load) and then replicated to the replica (followers).
> > Kafka does not automatically distribute the data equally to the different
> > brokers - you need to think about your architecture having that in mind.
> >
> > I hope it helps
> >
> > On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai <st...@gmail.com>
> > wrote:
> >
> > > I initially started with one data disk (mounted solely to hold Kafka
> > data)
> > > and recently added a new one.
> > >
> > > On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com> wrote:
> > >
> > > > What do you mean older disk ?
> > > >
> > > > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com>
> wrote:
> > > >
> > > >     [External]
> > > >
> > > >
> > > >     Yeah, but it doesn't do that. My "older" disks have ~70
> partitions,
> > > the
> > > >     newer ones ~5 partitions. That's why I'm asking what went wrong.
> > > >
> > > >     On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com>
> > wrote:
> > > >
> > > >     > Kafka  evenly distributed number of partition on each disk so
> in
> > > > your case
> > > >     > every disk should have 3/2 topic partitions .
> > > >     > It is producer job to evenly produce data by partition key  to
> > > topic
> > > >     > partition .
> > > >     > How it partition key , it is auto generated or producer sending
> > key
> > > > along
> > > >     > with message .
> > > >     >
> > > >     >
> > > >     > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com>
> > > wrote:
> > > >     >
> > > >     >     [External]
> > > >     >
> > > >     >
> > > >     >     Hello,
> > > >     >
> > > >     >     I have a Kafka cluster with 3 brokers (v2.3.0) and each
> > broker
> > > > has 2
> > > >     > disks
> > > >     >     attached. I added a new topic (heavyweight) and was
> surprised
> > > > that
> > > >     > even if
> > > >     >     the topic has 15 partitions, those weren't distributed
> evenly
> > > on
> > > > the
> > > >     > disks.
> > > >     >     Thus I got one disk that's almost empty and the other
> almost
> > > > filled
> > > >     > up. Is
> > > >     >     there any way to have Kafka evenly distribute data on its
> > > disks?
> > > >     >
> > > >     >     Thank you!
> > > >     >
> > > >     >
> > > >     > This e-mail and any files transmitted with it are for the sole
> > use
> > > > of the
> > > >     > intended recipient(s) and may contain confidential and
> privileged
> > > >     > information. If you are not the intended recipient(s), please
> > reply
> > > > to the
> > > >     > sender and destroy all copies of the original message. Any
> > > > unauthorized
> > > >     > review, use, disclosure, dissemination, forwarding, printing or
> > > > copying of
> > > >     > this email, and/or any action taken in reliance on the contents
> > of
> > > > this
> > > >     > e-mail is strictly prohibited and may be unlawful. Where
> > permitted
> > > by
> > > >     > applicable law, this e-mail and other e-mail communications
> sent
> > to
> > > > and
> > > >     > from Cognizant e-mail addresses may be monitored.
> > > >     > This e-mail and any files transmitted with it are for the sole
> > use
> > > > of the
> > > >     > intended recipient(s) and may contain confidential and
> privileged
> > > >     > information. If you are not the intended recipient(s), please
> > reply
> > > > to the
> > > >     > sender and destroy all copies of the original message. Any
> > > > unauthorized
> > > >     > review, use, disclosure, dissemination, forwarding, printing or
> > > > copying of
> > > >     > this email, and/or any action taken in reliance on the contents
> > of
> > > > this
> > > >     > e-mail is strictly prohibited and may be unlawful. Where
> > permitted
> > > by
> > > >     > applicable law, this e-mail and other e-mail communications
> sent
> > to
> > > > and
> > > >     > from Cognizant e-mail addresses may be monitored.
> > > >     >
> > > >
> > > >
> > > > This e-mail and any files transmitted with it are for the sole use of
> > the
> > > > intended recipient(s) and may contain confidential and privileged
> > > > information. If you are not the intended recipient(s), please reply
> to
> > > the
> > > > sender and destroy all copies of the original message. Any
> unauthorized
> > > > review, use, disclosure, dissemination, forwarding, printing or
> copying
> > > of
> > > > this email, and/or any action taken in reliance on the contents of
> this
> > > > e-mail is strictly prohibited and may be unlawful. Where permitted by
> > > > applicable law, this e-mail and other e-mail communications sent to
> and
> > > > from Cognizant e-mail addresses may be monitored.
> > > > This e-mail and any files transmitted with it are for the sole use of
> > the
> > > > intended recipient(s) and may contain confidential and privileged
> > > > information. If you are not the intended recipient(s), please reply
> to
> > > the
> > > > sender and destroy all copies of the original message. Any
> unauthorized
> > > > review, use, disclosure, dissemination, forwarding, printing or
> copying
> > > of
> > > > this email, and/or any action taken in reliance on the contents of
> this
> > > > e-mail is strictly prohibited and may be unlawful. Where permitted by
> > > > applicable law, this e-mail and other e-mail communications sent to
> and
> > > > from Cognizant e-mail addresses may be monitored.
> > > >
> > >
> >
>

Re: Kafka topic partition distributing evenly on disks

Posted by Yingshuan Song <so...@gmail.com>.
Hi Peter,
Agreed with Manoj and Vinicius, i think those rules led to this result :

1)the partitions of a topic - N and replication number - R determine the
real partition-replica count of this topic, which is N * R;
2)   kafka can distribute partitions evenly among brokers, but it is based
on the broker count when the topic was created, this is important.
If we create a topic (N - 4, R - 3) in a kafka cluster which contains 3
kafka brokers, then 4 * 3 / 3 = 4 partitions will be assigned to each
broker.
But if a new broker was added into this cluster and another topic (N - 4, R
- 3) need to be created, then 4 * 3 / 4 = 3 partitions will be assigned to
each broker.
Kafka will not assign all those partitions to the new added broker even
though it is idle and i think this is a shortcoming of kafka.
This rule also applies to disk-level, which means that when a set of
partitions assigned to a specific broker, each of the disks will get the
same number of partitions without considering the load of disks at that
time.
3) when producer send records to topics, how to chose partiton : 3-1) if a
record has a key, then the partition number calculate according to the key;
3-2) if  records have no keys, then those records will be sent to each
partition in turns. So, if there are lots of records with the same key, and
those records will be sent to the same partition, and may take up a lot of
disk space.


hope this helps

Vinicius Scheidegger <vi...@gmail.com> 于2020年8月7日周五 上午6:10写道:

> Hi Peter,
>
> AFAIK, everything depends on:
>
> 1) How you have configured your topic
>   a) number of partitions (here I understand you have 15 partitions)
>   b) partition replication configuration (each partition necessarily has a
> leader - primary responsible to hold the data - and for reads and writes)
> you can configure the topic to have a number of replicas
> 2) How you publish messages to the topic
>   a) The publisher is responsible to choose the partition. This can be done
> consciously (by setting the partition id while sending the message to the
> topic) or unconsciously (by using the DefaultPartitioner or any other
> partitioner scheme).
>
> All messages sent to a specific partition will be written first to the
> leader (meaning that the disk configured for the partition leader will
> receive the load) and then replicated to the replica (followers).
> Kafka does not automatically distribute the data equally to the different
> brokers - you need to think about your architecture having that in mind.
>
> I hope it helps
>
> On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai <st...@gmail.com>
> wrote:
>
> > I initially started with one data disk (mounted solely to hold Kafka
> data)
> > and recently added a new one.
> >
> > On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com> wrote:
> >
> > > What do you mean older disk ?
> > >
> > > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com> wrote:
> > >
> > >     [External]
> > >
> > >
> > >     Yeah, but it doesn't do that. My "older" disks have ~70 partitions,
> > the
> > >     newer ones ~5 partitions. That's why I'm asking what went wrong.
> > >
> > >     On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com>
> wrote:
> > >
> > >     > Kafka  evenly distributed number of partition on each disk so in
> > > your case
> > >     > every disk should have 3/2 topic partitions .
> > >     > It is producer job to evenly produce data by partition key  to
> > topic
> > >     > partition .
> > >     > How it partition key , it is auto generated or producer sending
> key
> > > along
> > >     > with message .
> > >     >
> > >     >
> > >     > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com>
> > wrote:
> > >     >
> > >     >     [External]
> > >     >
> > >     >
> > >     >     Hello,
> > >     >
> > >     >     I have a Kafka cluster with 3 brokers (v2.3.0) and each
> broker
> > > has 2
> > >     > disks
> > >     >     attached. I added a new topic (heavyweight) and was surprised
> > > that
> > >     > even if
> > >     >     the topic has 15 partitions, those weren't distributed evenly
> > on
> > > the
> > >     > disks.
> > >     >     Thus I got one disk that's almost empty and the other almost
> > > filled
> > >     > up. Is
> > >     >     there any way to have Kafka evenly distribute data on its
> > disks?
> > >     >
> > >     >     Thank you!
> > >     >
> > >     >
> > >     > This e-mail and any files transmitted with it are for the sole
> use
> > > of the
> > >     > intended recipient(s) and may contain confidential and privileged
> > >     > information. If you are not the intended recipient(s), please
> reply
> > > to the
> > >     > sender and destroy all copies of the original message. Any
> > > unauthorized
> > >     > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > >     > this email, and/or any action taken in reliance on the contents
> of
> > > this
> > >     > e-mail is strictly prohibited and may be unlawful. Where
> permitted
> > by
> > >     > applicable law, this e-mail and other e-mail communications sent
> to
> > > and
> > >     > from Cognizant e-mail addresses may be monitored.
> > >     > This e-mail and any files transmitted with it are for the sole
> use
> > > of the
> > >     > intended recipient(s) and may contain confidential and privileged
> > >     > information. If you are not the intended recipient(s), please
> reply
> > > to the
> > >     > sender and destroy all copies of the original message. Any
> > > unauthorized
> > >     > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > >     > this email, and/or any action taken in reliance on the contents
> of
> > > this
> > >     > e-mail is strictly prohibited and may be unlawful. Where
> permitted
> > by
> > >     > applicable law, this e-mail and other e-mail communications sent
> to
> > > and
> > >     > from Cognizant e-mail addresses may be monitored.
> > >     >
> > >
> > >
> > > This e-mail and any files transmitted with it are for the sole use of
> the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply to
> > the
> > > sender and destroy all copies of the original message. Any unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or copying
> > of
> > > this email, and/or any action taken in reliance on the contents of this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted by
> > > applicable law, this e-mail and other e-mail communications sent to and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use of
> the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply to
> > the
> > > sender and destroy all copies of the original message. Any unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or copying
> > of
> > > this email, and/or any action taken in reliance on the contents of this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted by
> > > applicable law, this e-mail and other e-mail communications sent to and
> > > from Cognizant e-mail addresses may be monitored.
> > >
> >
>

Re: Kafka topic partition distributing evenly on disks

Posted by Vinicius Scheidegger <vi...@gmail.com>.
Hi Peter,

AFAIK, everything depends on:

1) How you have configured your topic
  a) number of partitions (here I understand you have 15 partitions)
  b) partition replication configuration (each partition necessarily has a
leader - primary responsible to hold the data - and for reads and writes)
you can configure the topic to have a number of replicas
2) How you publish messages to the topic
  a) The publisher is responsible to choose the partition. This can be done
consciously (by setting the partition id while sending the message to the
topic) or unconsciously (by using the DefaultPartitioner or any other
partitioner scheme).

All messages sent to a specific partition will be written first to the
leader (meaning that the disk configured for the partition leader will
receive the load) and then replicated to the replica (followers).
Kafka does not automatically distribute the data equally to the different
brokers - you need to think about your architecture having that in mind.

I hope it helps

On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai <st...@gmail.com>
wrote:

> I initially started with one data disk (mounted solely to hold Kafka data)
> and recently added a new one.
>
> On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com> wrote:
>
> > What do you mean older disk ?
> >
> > On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com> wrote:
> >
> >     [External]
> >
> >
> >     Yeah, but it doesn't do that. My "older" disks have ~70 partitions,
> the
> >     newer ones ~5 partitions. That's why I'm asking what went wrong.
> >
> >     On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com> wrote:
> >
> >     > Kafka  evenly distributed number of partition on each disk so in
> > your case
> >     > every disk should have 3/2 topic partitions .
> >     > It is producer job to evenly produce data by partition key  to
> topic
> >     > partition .
> >     > How it partition key , it is auto generated or producer sending key
> > along
> >     > with message .
> >     >
> >     >
> >     > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com>
> wrote:
> >     >
> >     >     [External]
> >     >
> >     >
> >     >     Hello,
> >     >
> >     >     I have a Kafka cluster with 3 brokers (v2.3.0) and each broker
> > has 2
> >     > disks
> >     >     attached. I added a new topic (heavyweight) and was surprised
> > that
> >     > even if
> >     >     the topic has 15 partitions, those weren't distributed evenly
> on
> > the
> >     > disks.
> >     >     Thus I got one disk that's almost empty and the other almost
> > filled
> >     > up. Is
> >     >     there any way to have Kafka evenly distribute data on its
> disks?
> >     >
> >     >     Thank you!
> >     >
> >     >
> >     > This e-mail and any files transmitted with it are for the sole use
> > of the
> >     > intended recipient(s) and may contain confidential and privileged
> >     > information. If you are not the intended recipient(s), please reply
> > to the
> >     > sender and destroy all copies of the original message. Any
> > unauthorized
> >     > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> >     > this email, and/or any action taken in reliance on the contents of
> > this
> >     > e-mail is strictly prohibited and may be unlawful. Where permitted
> by
> >     > applicable law, this e-mail and other e-mail communications sent to
> > and
> >     > from Cognizant e-mail addresses may be monitored.
> >     > This e-mail and any files transmitted with it are for the sole use
> > of the
> >     > intended recipient(s) and may contain confidential and privileged
> >     > information. If you are not the intended recipient(s), please reply
> > to the
> >     > sender and destroy all copies of the original message. Any
> > unauthorized
> >     > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> >     > this email, and/or any action taken in reliance on the contents of
> > this
> >     > e-mail is strictly prohibited and may be unlawful. Where permitted
> by
> >     > applicable law, this e-mail and other e-mail communications sent to
> > and
> >     > from Cognizant e-mail addresses may be monitored.
> >     >
> >
> >
> > This e-mail and any files transmitted with it are for the sole use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply to
> the
> > sender and destroy all copies of the original message. Any unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or copying
> of
> > this email, and/or any action taken in reliance on the contents of this
> > e-mail is strictly prohibited and may be unlawful. Where permitted by
> > applicable law, this e-mail and other e-mail communications sent to and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply to
> the
> > sender and destroy all copies of the original message. Any unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or copying
> of
> > this email, and/or any action taken in reliance on the contents of this
> > e-mail is strictly prohibited and may be unlawful. Where permitted by
> > applicable law, this e-mail and other e-mail communications sent to and
> > from Cognizant e-mail addresses may be monitored.
> >
>

Re: Kafka topic partition distributing evenly on disks

Posted by Péter Nagykátai <st...@gmail.com>.
I initially started with one data disk (mounted solely to hold Kafka data)
and recently added a new one.

On Thu, Aug 6, 2020 at 10:13 PM <Ma...@cognizant.com> wrote:

> What do you mean older disk ?
>
> On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com> wrote:
>
>     [External]
>
>
>     Yeah, but it doesn't do that. My "older" disks have ~70 partitions, the
>     newer ones ~5 partitions. That's why I'm asking what went wrong.
>
>     On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com> wrote:
>
>     > Kafka  evenly distributed number of partition on each disk so in
> your case
>     > every disk should have 3/2 topic partitions .
>     > It is producer job to evenly produce data by partition key  to topic
>     > partition .
>     > How it partition key , it is auto generated or producer sending key
> along
>     > with message .
>     >
>     >
>     > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com> wrote:
>     >
>     >     [External]
>     >
>     >
>     >     Hello,
>     >
>     >     I have a Kafka cluster with 3 brokers (v2.3.0) and each broker
> has 2
>     > disks
>     >     attached. I added a new topic (heavyweight) and was surprised
> that
>     > even if
>     >     the topic has 15 partitions, those weren't distributed evenly on
> the
>     > disks.
>     >     Thus I got one disk that's almost empty and the other almost
> filled
>     > up. Is
>     >     there any way to have Kafka evenly distribute data on its disks?
>     >
>     >     Thank you!
>     >
>     >
>     > This e-mail and any files transmitted with it are for the sole use
> of the
>     > intended recipient(s) and may contain confidential and privileged
>     > information. If you are not the intended recipient(s), please reply
> to the
>     > sender and destroy all copies of the original message. Any
> unauthorized
>     > review, use, disclosure, dissemination, forwarding, printing or
> copying of
>     > this email, and/or any action taken in reliance on the contents of
> this
>     > e-mail is strictly prohibited and may be unlawful. Where permitted by
>     > applicable law, this e-mail and other e-mail communications sent to
> and
>     > from Cognizant e-mail addresses may be monitored.
>     > This e-mail and any files transmitted with it are for the sole use
> of the
>     > intended recipient(s) and may contain confidential and privileged
>     > information. If you are not the intended recipient(s), please reply
> to the
>     > sender and destroy all copies of the original message. Any
> unauthorized
>     > review, use, disclosure, dissemination, forwarding, printing or
> copying of
>     > this email, and/or any action taken in reliance on the contents of
> this
>     > e-mail is strictly prohibited and may be unlawful. Where permitted by
>     > applicable law, this e-mail and other e-mail communications sent to
> and
>     > from Cognizant e-mail addresses may be monitored.
>     >
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>

Re: Kafka topic partition distributing evenly on disks

Posted by Ma...@cognizant.com.
What do you mean older disk ?

On 8/6/20, 12:05 PM, "Péter Nagykátai" <st...@gmail.com> wrote:

    [External]


    Yeah, but it doesn't do that. My "older" disks have ~70 partitions, the
    newer ones ~5 partitions. That's why I'm asking what went wrong.

    On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com> wrote:

    > Kafka  evenly distributed number of partition on each disk so in your case
    > every disk should have 3/2 topic partitions .
    > It is producer job to evenly produce data by partition key  to topic
    > partition .
    > How it partition key , it is auto generated or producer sending key along
    > with message .
    >
    >
    > On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com> wrote:
    >
    >     [External]
    >
    >
    >     Hello,
    >
    >     I have a Kafka cluster with 3 brokers (v2.3.0) and each broker has 2
    > disks
    >     attached. I added a new topic (heavyweight) and was surprised that
    > even if
    >     the topic has 15 partitions, those weren't distributed evenly on the
    > disks.
    >     Thus I got one disk that's almost empty and the other almost filled
    > up. Is
    >     there any way to have Kafka evenly distribute data on its disks?
    >
    >     Thank you!
    >
    >
    > This e-mail and any files transmitted with it are for the sole use of the
    > intended recipient(s) and may contain confidential and privileged
    > information. If you are not the intended recipient(s), please reply to the
    > sender and destroy all copies of the original message. Any unauthorized
    > review, use, disclosure, dissemination, forwarding, printing or copying of
    > this email, and/or any action taken in reliance on the contents of this
    > e-mail is strictly prohibited and may be unlawful. Where permitted by
    > applicable law, this e-mail and other e-mail communications sent to and
    > from Cognizant e-mail addresses may be monitored.
    > This e-mail and any files transmitted with it are for the sole use of the
    > intended recipient(s) and may contain confidential and privileged
    > information. If you are not the intended recipient(s), please reply to the
    > sender and destroy all copies of the original message. Any unauthorized
    > review, use, disclosure, dissemination, forwarding, printing or copying of
    > this email, and/or any action taken in reliance on the contents of this
    > e-mail is strictly prohibited and may be unlawful. Where permitted by
    > applicable law, this e-mail and other e-mail communications sent to and
    > from Cognizant e-mail addresses may be monitored.
    >


This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Kafka topic partition distributing evenly on disks

Posted by Péter Nagykátai <st...@gmail.com>.
Yeah, but it doesn't do that. My "older" disks have ~70 partitions, the
newer ones ~5 partitions. That's why I'm asking what went wrong.

On Thu, Aug 6, 2020 at 8:35 PM <Ma...@cognizant.com> wrote:

> Kafka  evenly distributed number of partition on each disk so in your case
> every disk should have 3/2 topic partitions .
> It is producer job to evenly produce data by partition key  to topic
> partition .
> How it partition key , it is auto generated or producer sending key along
> with message .
>
>
> On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com> wrote:
>
>     [External]
>
>
>     Hello,
>
>     I have a Kafka cluster with 3 brokers (v2.3.0) and each broker has 2
> disks
>     attached. I added a new topic (heavyweight) and was surprised that
> even if
>     the topic has 15 partitions, those weren't distributed evenly on the
> disks.
>     Thus I got one disk that's almost empty and the other almost filled
> up. Is
>     there any way to have Kafka evenly distribute data on its disks?
>
>     Thank you!
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>

Re: Kafka topic partition distributing evenly on disks

Posted by Ma...@cognizant.com.
Kafka  evenly distributed number of partition on each disk so in your case every disk should have 3/2 topic partitions .
It is producer job to evenly produce data by partition key  to topic partition .
How it partition key , it is auto generated or producer sending key along with message .


On 8/6/20, 7:29 AM, "Péter Nagykátai" <st...@gmail.com> wrote:

    [External]


    Hello,

    I have a Kafka cluster with 3 brokers (v2.3.0) and each broker has 2 disks
    attached. I added a new topic (heavyweight) and was surprised that even if
    the topic has 15 partitions, those weren't distributed evenly on the disks.
    Thus I got one disk that's almost empty and the other almost filled up. Is
    there any way to have Kafka evenly distribute data on its disks?

    Thank you!


This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.