You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Luke Chen (Jira)" <ji...@apache.org> on 2021/08/02 08:47:00 UTC

[jira] [Commented] (KAFKA-10888) Sticky partition leads to uneven product msg, resulting in abnormal delays in some partitions

    [ https://issues.apache.org/jira/browse/KAFKA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391446#comment-17391446 ] 

Luke Chen commented on KAFKA-10888:
-----------------------------------

[~hachikuji], I was thinking about your suggestion above, to be analogous to the tcp window size implementation. If I understand it correctly, this example should explain what your suggestion:

*topic A with partition 3:*

Suppose we use default `batch.size` and `linger.ms` setting (16K bytes, 0ms), and on average, we can send 2k bytes for each batch, and we set the default window size of 1k bytes. So,

1st batch, we'll have: (window size = 1k)

partition A-0, 1k bytes (reach the limit, so move to next partition, and increase to 2k bytes window size (suppose +/- 1k each time))

partition A-1, 1k bytes

 

2nd batch, we'll have: (window size = 2k)

partition A-2, 2k bytes

 

===

so far, we'll have

partition A-0: 1k bytes

partition A-1: 1k bytes

partition A-2: 2k bytes

and keep going.

===

Suppose partition A-0 slows down in next batch, with 6k bytes be sent:

4th batch (window size = 2k) ,

partition A-0, 2k bytes (reach the limit, so move to next partition, and increase to 2k bytes window size, so 4k bytes now )

partition A-1, 4k bytes

 

5th batch (window size = 4k)

partition A-2, 2k bytes (not reach the limit, keep sending to partition A-2 in next batch, decrease the window size to 2k bytes )

 

===

so far, we'll have

partition A-0: 3k bytes

partition A-1: 5k bytes

partition A-2: 4k bytes

and keep going.

===

I think this proposal might still cause uneven distribution as above example showed. Also, it will send to 2 or more batches in some cases(ex: 1st batch in above example) , which was originally sent to 1 batch only (and it's the spirit of the sticky partitioner, to stick to a partition before batch full to improve throughput).

 

So, I'm proposing a way to make it evenly distribution and still keep the original sticky partitioner spirit: check the distribution status when all partitions are sent 1 batch. And have a threshold to see if we want to skip the exceeding partition in the following rounds, and how many rounds it should be skipped. Using the above example:

---1st round, same result as using original sticky partitioner---

1st batch

partition A-0, 2k bytes

 

2nd batch

partition A-1, 2k bytes

 

3rd batch:

partition A-2, 2k bytes

===

so far, we'll have

partition A-0: 2k bytes

partition A-1: 2k bytes

partition A-2: 2k bytes

After all partitions sent 1 batch, we check if there's any partition batch size in this round is exceeding other partitions more than the threshold (ex: 70%), here, no, so keeps going 

===

---2nd round, still have the same result as using original sticky partitioner---

Suppose partition A-0 slows down in next batch, with 6k bytes be sent:

4th batch

partition A-0, 6k bytes

 

5th batch

partition A-1, 2k bytes 

 

6th batch 

partition A-2, 2k bytes

===

in this round, we'll have

partition A-0: 6k bytes

partition A-1: 2k bytes

partition A-2: 2k bytes

After this round, we check if there's any partition batch size in this round is exceeding other partitions more than the threshold (ex: 70%), here, we have partition A-0 exceeding with 4k bytes, and compute how many rounds it should be skipped: 4k / (2k * 0.7) = 2.8 => only care the integer part, 2. So, partition A-0 should be skipped 2 rounds

===

So, we can imagine, after 3rd and 4th rounds, partition A-0 is skipped, we'll have balanced messages sent

partition A-0: 6k bytes

partition A-1: 6k bytes

partition A-2: 6k bytes

 

What do you think?

(Sorry for the long response) Thank you.

 

>  Sticky partition leads to uneven product msg, resulting in abnormal delays in some partitions
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10888
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10888
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, producer 
>    Affects Versions: 2.4.1
>            Reporter: jr
>            Assignee: Luke Chen
>            Priority: Major
>         Attachments: image-2020-12-24-21-05-02-800.png, image-2020-12-24-21-09-47-692.png, image-2020-12-24-21-10-24-407.png
>
>
>   110 producers ,550 partitions ,550 consumers , 5 nodes Kafka cluster
>   The producer uses the nullkey+stick partitioner, the total production rate is about 100w tps
> Observed partition delay is abnormal and message distribution is uneven, which leads to the maximum production and consumption delay of the partition with more messages 
> abnormal.
>   I cannot find reason that stick will make the message distribution uneven at this production rate.
>   I can't switch to the round-robin partitioner, which will increase the delay and cpu cost. Is thathe stick partationer design cause uneven message distribution, or this is abnormal. How to solve it?
>   !image-2020-12-24-21-09-47-692.png!
> As shown in the picture, the uneven distribution is concentrated on some partitions and some brokers, there seems to be some rules.
> This problem does not only occur in one cluster, but in many high tps clusters,
> The problem is more obvious on the test cluster we built.
> !image-2020-12-24-21-10-24-407.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)