You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Luke Chen (Jira)" <ji...@apache.org> on 2021/08/02 08:47:00 UTC
[jira] [Commented] (KAFKA-10888) Sticky partition leads to uneven
product msg, resulting in abnormal delays in some partitions
[ https://issues.apache.org/jira/browse/KAFKA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391446#comment-17391446 ]
Luke Chen commented on KAFKA-10888:
-----------------------------------
[~hachikuji], I was thinking about your suggestion above, to be analogous to the tcp window size implementation. If I understand it correctly, this example should explain what your suggestion:
*topic A with partition 3:*
Suppose we use default `batch.size` and `linger.ms` setting (16K bytes, 0ms), and on average, we can send 2k bytes for each batch, and we set the default window size of 1k bytes. So,
1st batch, we'll have: (window size = 1k)
partition A-0, 1k bytes (reach the limit, so move to next partition, and increase to 2k bytes window size (suppose +/- 1k each time))
partition A-1, 1k bytes
2nd batch, we'll have: (window size = 2k)
partition A-2, 2k bytes
===
so far, we'll have
partition A-0: 1k bytes
partition A-1: 1k bytes
partition A-2: 2k bytes
and keep going.
===
Suppose partition A-0 slows down in next batch, with 6k bytes be sent:
4th batch (window size = 2k) ,
partition A-0, 2k bytes (reach the limit, so move to next partition, and increase to 2k bytes window size, so 4k bytes now )
partition A-1, 4k bytes
5th batch (window size = 4k)
partition A-2, 2k bytes (not reach the limit, keep sending to partition A-2 in next batch, decrease the window size to 2k bytes )
===
so far, we'll have
partition A-0: 3k bytes
partition A-1: 5k bytes
partition A-2: 4k bytes
and keep going.
===
I think this proposal might still cause uneven distribution as above example showed. Also, it will send to 2 or more batches in some cases(ex: 1st batch in above example) , which was originally sent to 1 batch only (and it's the spirit of the sticky partitioner, to stick to a partition before batch full to improve throughput).
So, I'm proposing a way to make it evenly distribution and still keep the original sticky partitioner spirit: check the distribution status when all partitions are sent 1 batch. And have a threshold to see if we want to skip the exceeding partition in the following rounds, and how many rounds it should be skipped. Using the above example:
---1st round, same result as using original sticky partitioner---
1st batch
partition A-0, 2k bytes
2nd batch
partition A-1, 2k bytes
3rd batch:
partition A-2, 2k bytes
===
so far, we'll have
partition A-0: 2k bytes
partition A-1: 2k bytes
partition A-2: 2k bytes
After all partitions sent 1 batch, we check if there's any partition batch size in this round is exceeding other partitions more than the threshold (ex: 70%), here, no, so keeps going
===
---2nd round, still have the same result as using original sticky partitioner---
Suppose partition A-0 slows down in next batch, with 6k bytes be sent:
4th batch
partition A-0, 6k bytes
5th batch
partition A-1, 2k bytes
6th batch
partition A-2, 2k bytes
===
in this round, we'll have
partition A-0: 6k bytes
partition A-1: 2k bytes
partition A-2: 2k bytes
After this round, we check if there's any partition batch size in this round is exceeding other partitions more than the threshold (ex: 70%), here, we have partition A-0 exceeding with 4k bytes, and compute how many rounds it should be skipped: 4k / (2k * 0.7) = 2.8 => only care the integer part, 2. So, partition A-0 should be skipped 2 rounds
===
So, we can imagine, after 3rd and 4th rounds, partition A-0 is skipped, we'll have balanced messages sent
partition A-0: 6k bytes
partition A-1: 6k bytes
partition A-2: 6k bytes
What do you think?
(Sorry for the long response) Thank you.
> Sticky partition leads to uneven product msg, resulting in abnormal delays in some partitions
> ----------------------------------------------------------------------------------------------
>
> Key: KAFKA-10888
> URL: https://issues.apache.org/jira/browse/KAFKA-10888
> Project: Kafka
> Issue Type: Bug
> Components: clients, producer
> Affects Versions: 2.4.1
> Reporter: jr
> Assignee: Luke Chen
> Priority: Major
> Attachments: image-2020-12-24-21-05-02-800.png, image-2020-12-24-21-09-47-692.png, image-2020-12-24-21-10-24-407.png
>
>
> 110 producers ,550 partitions ,550 consumers , 5 nodes Kafka cluster
> The producer uses the nullkey+stick partitioner, the total production rate is about 100w tps
> Observed partition delay is abnormal and message distribution is uneven, which leads to the maximum production and consumption delay of the partition with more messages
> abnormal.
> I cannot find reason that stick will make the message distribution uneven at this production rate.
> I can't switch to the round-robin partitioner, which will increase the delay and cpu cost. Is thathe stick partationer design cause uneven message distribution, or this is abnormal. How to solve it?
> !image-2020-12-24-21-09-47-692.png!
> As shown in the picture, the uneven distribution is concentrated on some partitions and some brokers, there seems to be some rules.
> This problem does not only occur in one cluster, but in many high tps clusters,
> The problem is more obvious on the test cluster we built.
> !image-2020-12-24-21-10-24-407.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)