You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Thomas Heinze (Jira)" <ji...@apache.org> on 2021/10/12 08:10:00 UTC

[jira] [Updated] (KAFKA-13367) Performance Degradation during introducing Network Delay

     [ https://issues.apache.org/jira/browse/KAFKA-13367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Heinze updated KAFKA-13367:
----------------------------------
    Affects Version/s: 2.5.1

> Performance Degradation during introducing Network Delay
> --------------------------------------------------------
>
>                 Key: KAFKA-13367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13367
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>         Environment: We are running Kafka 2.5 on m4.xlarge VMs on AWS.
>            Reporter: Thomas Heinze
>            Priority: Major
>
> Hi Kafka community,
>  
> we are running a few chaos experiments to simulate Kafka's behaviour during issues in the data center. To simulate a slow network we run the following command on two out of six brokers (the brokers are spread across 3 AZs on AWS, we run the command on two brokers in the same AZ):
> {code:java}
> tc qdisc add dev eth0 root netem delay x ms 
>  {code}
>  
>  At the same time we are running some Kafka producers inserting roughly 4k messages per second to a Kafka topic with 10 partitions with 3 replicas and using min-isr=2. What we observe is the following:
>  * *Introducing a 1000 ms delay*: The producer see significant response time delays, the throughput drops to 2k per second
>  * *Introducing a 2000 ms delay*: The producer delay increases further, the throughput drops to 300 messages per second
>  * *Introducing a 5000 ms delay*: The Kafka clusters remove the slow brokers from the list of active replicas and the incoming messages for the remaining brokers increases. This is the expected behaviour imho.
> What parameters would influence this behaviour? How can I make sure Kafka shows the behaviour like for 5 seconds even for smaller delays? We would like to make sure that we can guarantee around a certain throughput, even if one AZ is very slow.
> I already tried to set "replica.lag.time.max.ms" to very small values, but I only observe that Kafka adds and remove the replicas on the slow nodes constantly from the set of ISR.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)