You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Paolo Moriello <pa...@gmail.com> on 2020/03/10 16:06:42 UTC

[DISCUSS] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll

Hi,

I've just created a Jira ticket to summarize the results of my analysis and
propose a mitigation to the latency spikes:
https://issues.apache.org/jira/browse/KAFKA-9693
Please have a look at the ticket.
Do you see any important implication/risk in doing this change?

Thanks,
Paolo

On Tue, 18 Feb 2020 at 14:42, Paolo Moriello <pa...@gmail.com>
wrote:

> Hello,
>
>
> I'm performing an investigation on Kafka latency. During my analysis I was
> able to reproduce a scenario in which Kafka latency repeatedly spikes at
> constant frequency, for small amounts of time.
>
> In my tests, in particular, latency could spike every ~2 minutes
> (dependently on the throughput and input...) from an avg of ~3ms up to a
> max of +500ms (p95-p99).
>
> See image: https://imagizer.imageshack.com/img922/5308/glhkO4.png
>
>
> Further investigations showed that this is most likely caused by log
> segments being rolled over.
>
>
> Did anybody ever noticed anything like that? Do you know if it is possible
> to tune p99 performance in order to reduce/eliminate the latency spikes?
>
>
> Thanks,
>
> Paolo
>
>
> Test configuration:
>
>    - 15 brokers
>    - 6 producers, ack=1, no compression
>    - 1 topic, 90 partitions
>    - Kafka 2.2.1
>
>