You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Ara Ebrahimi <ar...@argyledata.com> on 2017/01/09 17:52:52 UTC

kafka streams consumer partition assignment is uneven

Hi,

I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3 kafka streams nodes. Each is configured to have 4 streaming threads. My topology is quite complex and I have 7 topics and lots of joins and states.

What I have noticed is that each of the 3 kafka streams nodes gets configured to process variables number of partitions of a topic. One node is assigned to process 2 partitions of topic a and another one gets assigned 5. Hence I end up with nonuniform throughput across these nodes. One node ends up processing more data than the other.

What’s going on? How can I make sure partitions assignment to kafka streams nodes is uniform?

On a similar topic, is there a way to make sure partition assignment to disks across kafka brokers is also uniform? Even if I use a round-robin one to pin partitions to broker, but there doesn’t seem to be a way to uniformly pin partitions to disks. Or maybe I’m missing something here? I end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly distributed either.

Ara.



________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________

Re: kafka streams consumer partition assignment is uneven

Posted by Gwen Shapira <gw...@confluent.io>.

btw. in case you didn't find out yet (I just discovered this...), you
can get the entire topology by starting the stream, waiting a bit and
then printing "KafkaStreams.toString()" to console.

I found it useful and cool :)


On Tue, Jan 17, 2017 at 3:19 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> Sorry for answering late.
>
> The mapping from partitions to threads also depend on the structure of
> your topology. As you mention that you have a quite complex one, I
> assume that this is the reason for the uneven distribution. I you want
> to dig deeper, it would be helpful to know the structure of your topology.
>
>
> -Matthias
>
> On 1/9/17 12:05 PM, Ara Ebrahimi wrote:
>> I meant I have 7 topics and each has 12 partitions. Considering that I have 4 streaming threads per node, I was expecting to see each thread process 1 partition from each topics and 7 partitions total per streaming thread. But that’s not the case. Or perhaps you are saying the number of streaming threads should follow the total number of partitions across all 7 topics?!
>>
>> Ara.
>>
>>> On Jan 9, 2017, at 11:48 AM, Michael Noll <mi...@confluent.io> wrote:
>>>
>>> What does the processing topology of your Kafka Streams application look
>>> like, and what's the exact topic and partition configuration?  You say you
>>> have 12 partitions in your cluster, presumably across 7 topics -- that
>>> means that most topics have just a single partition.  Depending on your
>>> topology (e.g. if you have defined that single-partition topics A, B, C
>>> must be joined), Kafka Streams is forced to let one of your three Streams
>>> nodes process "more" topics/partitions than the other two nodes.
>>>
>>> -Michael
>>>
>>>
>>>
>>> On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ar...@argyledata.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3
>>>> kafka streams nodes. Each is configured to have 4 streaming threads. My
>>>> topology is quite complex and I have 7 topics and lots of joins and states.
>>>>
>>>> What I have noticed is that each of the 3 kafka streams nodes gets
>>>> configured to process variables number of partitions of a topic. One node
>>>> is assigned to process 2 partitions of topic a and another one gets
>>>> assigned 5. Hence I end up with nonuniform throughput across these nodes.
>>>> One node ends up processing more data than the other.
>>>>
>>>> What’s going on? How can I make sure partitions assignment to kafka
>>>> streams nodes is uniform?
>>>>
>>>> On a similar topic, is there a way to make sure partition assignment to
>>>> disks across kafka brokers is also uniform? Even if I use a round-robin one
>>>> to pin partitions to broker, but there doesn’t seem to be a way to
>>>> uniformly pin partitions to disks. Or maybe I’m missing something here? I
>>>> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a
>>>> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly
>>>> distributed either.
>>>>
>>>> Ara.
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you have
>>>> received it in error, please notify the sender immediately and delete the
>>>> original. Any other use of the e-mail by you is prohibited. Thank you in
>>>> advance for your cooperation.
>>>>
>>>> ________________________________
>>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>>>
>>> ________________________________
>>
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>>
>> ________________________________
>>
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: kafka streams consumer partition assignment is uneven

Posted by "Matthias J. Sax" <ma...@confluent.io>.

Sorry for answering late.

The mapping from partitions to threads also depend on the structure of
your topology. As you mention that you have a quite complex one, I
assume that this is the reason for the uneven distribution. I you want
to dig deeper, it would be helpful to know the structure of your topology.


-Matthias

On 1/9/17 12:05 PM, Ara Ebrahimi wrote:
> I meant I have 7 topics and each has 12 partitions. Considering that I have 4 streaming threads per node, I was expecting to see each thread process 1 partition from each topics and 7 partitions total per streaming thread. But that’s not the case. Or perhaps you are saying the number of streaming threads should follow the total number of partitions across all 7 topics?!
> 
> Ara.
> 
>> On Jan 9, 2017, at 11:48 AM, Michael Noll <mi...@confluent.io> wrote:
>>
>> What does the processing topology of your Kafka Streams application look
>> like, and what's the exact topic and partition configuration?  You say you
>> have 12 partitions in your cluster, presumably across 7 topics -- that
>> means that most topics have just a single partition.  Depending on your
>> topology (e.g. if you have defined that single-partition topics A, B, C
>> must be joined), Kafka Streams is forced to let one of your three Streams
>> nodes process "more" topics/partitions than the other two nodes.
>>
>> -Michael
>>
>>
>>
>> On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ar...@argyledata.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3
>>> kafka streams nodes. Each is configured to have 4 streaming threads. My
>>> topology is quite complex and I have 7 topics and lots of joins and states.
>>>
>>> What I have noticed is that each of the 3 kafka streams nodes gets
>>> configured to process variables number of partitions of a topic. One node
>>> is assigned to process 2 partitions of topic a and another one gets
>>> assigned 5. Hence I end up with nonuniform throughput across these nodes.
>>> One node ends up processing more data than the other.
>>>
>>> What’s going on? How can I make sure partitions assignment to kafka
>>> streams nodes is uniform?
>>>
>>> On a similar topic, is there a way to make sure partition assignment to
>>> disks across kafka brokers is also uniform? Even if I use a round-robin one
>>> to pin partitions to broker, but there doesn’t seem to be a way to
>>> uniformly pin partitions to disks. Or maybe I’m missing something here? I
>>> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a
>>> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly
>>> distributed either.
>>>
>>> Ara.
>>>
>>>
>>>
>>> ________________________________
>>>
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you have
>>> received it in error, please notify the sender immediately and delete the
>>> original. Any other use of the e-mail by you is prohibited. Thank you in
>>> advance for your cooperation.
>>>
>>> ________________________________
>>>
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>>
>> ________________________________
> 
> 
> 
> 
> ________________________________
> 
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
> 
> ________________________________
>

Re: kafka streams consumer partition assignment is uneven

Posted by Ara Ebrahimi <ar...@argyledata.com>.

I meant I have 7 topics and each has 12 partitions. Considering that I have 4 streaming threads per node, I was expecting to see each thread process 1 partition from each topics and 7 partitions total per streaming thread. But that’s not the case. Or perhaps you are saying the number of streaming threads should follow the total number of partitions across all 7 topics?!

Ara.

> On Jan 9, 2017, at 11:48 AM, Michael Noll <mi...@confluent.io> wrote:
>
> What does the processing topology of your Kafka Streams application look
> like, and what's the exact topic and partition configuration?  You say you
> have 12 partitions in your cluster, presumably across 7 topics -- that
> means that most topics have just a single partition.  Depending on your
> topology (e.g. if you have defined that single-partition topics A, B, C
> must be joined), Kafka Streams is forced to let one of your three Streams
> nodes process "more" topics/partitions than the other two nodes.
>
> -Michael
>
>
>
> On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ar...@argyledata.com>
> wrote:
>
>> Hi,
>>
>> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3
>> kafka streams nodes. Each is configured to have 4 streaming threads. My
>> topology is quite complex and I have 7 topics and lots of joins and states.
>>
>> What I have noticed is that each of the 3 kafka streams nodes gets
>> configured to process variables number of partitions of a topic. One node
>> is assigned to process 2 partitions of topic a and another one gets
>> assigned 5. Hence I end up with nonuniform throughput across these nodes.
>> One node ends up processing more data than the other.
>>
>> What’s going on? How can I make sure partitions assignment to kafka
>> streams nodes is uniform?
>>
>> On a similar topic, is there a way to make sure partition assignment to
>> disks across kafka brokers is also uniform? Even if I use a round-robin one
>> to pin partitions to broker, but there doesn’t seem to be a way to
>> uniformly pin partitions to disks. Or maybe I’m missing something here? I
>> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a
>> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly
>> distributed either.
>>
>> Ara.
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Thank you in
>> advance for your cooperation.
>>
>> ________________________________
>>
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.
>
> ________________________________




________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation.

________________________________

Re: kafka streams consumer partition assignment is uneven

Posted by Michael Noll <mi...@confluent.io>.

What does the processing topology of your Kafka Streams application look
like, and what's the exact topic and partition configuration?  You say you
have 12 partitions in your cluster, presumably across 7 topics -- that
means that most topics have just a single partition.  Depending on your
topology (e.g. if you have defined that single-partition topics A, B, C
must be joined), Kafka Streams is forced to let one of your three Streams
nodes process "more" topics/partitions than the other two nodes.

-Michael



On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ar...@argyledata.com>
wrote:

> Hi,
>
> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3
> kafka streams nodes. Each is configured to have 4 streaming threads. My
> topology is quite complex and I have 7 topics and lots of joins and states.
>
> What I have noticed is that each of the 3 kafka streams nodes gets
> configured to process variables number of partitions of a topic. One node
> is assigned to process 2 partitions of topic a and another one gets
> assigned 5. Hence I end up with nonuniform throughput across these nodes.
> One node ends up processing more data than the other.
>
> What’s going on? How can I make sure partitions assignment to kafka
> streams nodes is uniform?
>
> On a similar topic, is there a way to make sure partition assignment to
> disks across kafka brokers is also uniform? Even if I use a round-robin one
> to pin partitions to broker, but there doesn’t seem to be a way to
> uniformly pin partitions to disks. Or maybe I’m missing something here? I
> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a
> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly
> distributed either.
>
> Ara.
>
>
>
> ________________________________
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
>
> ________________________________
>