You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Mathieu D <ma...@gmail.com> on 2020/10/23 18:07:40 UTC

Perf on history reprocessing

Hello there

Sometimes we need to reprocess a large amount of history data.
I find the performance in that case quite disappointing. More precisely
throughput is quite low (which is not surprising for a system optimized for
low latency).

Is there any knob to turn to get a much higher throughput in such cases ?

Thanks
Mathieu

Re: Perf on history reprocessing

Posted by Si Tang <si...@indeed.com.INVALID>.

Hi Mathieu,

I recently helped my team diagnose a performance issue with Kafka Streams
state store restoration (10X time after moving from 20 partitions to 60
partitions). Understanding the bottleneck should be the first thing to do.
The problem in our case was memory. Our application was spending lots of
time on gc because zstd decompression created too many byte arrays for
buffering (KAFKA-10470 <https://issues.apache.org/jira/browse/KAFKA-10470>).
So we switched the compression algo and that resolved the issue.
Hopefully this helps

On Sun, Oct 25, 2020 at 4:12 PM Fabio Pardi <f....@portavita.eu> wrote:

> hi Mathieu,
>
>
> the best approach in my opinion is to try to understand where your
> bottleneck is, analyzing the graphs produced during history reprocessing.
>
> my best bet are the disks, but indeed it might be anywhere.
>
>
> regards,
>
> fabio pardi
>
>
> On 23/10/2020 20:07, Mathieu D wrote:
> > Hello there
> >
> > Sometimes we need to reprocess a large amount of history data.
> > I find the performance in that case quite disappointing. More precisely
> > throughput is quite low (which is not surprising for a system optimized
> for
> > low latency).
> >
> > Is there any knob to turn to get a much higher throughput in such cases ?
> >
> > Thanks
> > Mathieu
> >
>
>

-- 

Si Tang

Re: Perf on history reprocessing

Posted by Fabio Pardi <f....@portavita.eu>.

hi Mathieu,

the best approach in my opinion is to try to understand where your bottleneck is, analyzing the graphs produced during history reprocessing.

my best bet are the disks, but indeed it might be anywhere.

regards,

fabio pardi

On 23/10/2020 20:07, Mathieu D wrote:
> Hello there
>
> Sometimes we need to reprocess a large amount of history data.
> I find the performance in that case quite disappointing. More precisely
> throughput is quite low (which is not surprising for a system optimized for
> low latency).
>
> Is there any knob to turn to get a much higher throughput in such cases ?
>
> Thanks
> Mathieu
>

Re: Perf on history reprocessing [kafka-streams]

Posted by Mathieu D <ma...@gmail.com>.

To clarify my question: here i'm focusing on the kafka-streams part.

Le ven. 23 oct. 2020 à 20:07, Mathieu D <ma...@gmail.com> a écrit :

> Hello there
>
> Sometimes we need to reprocess a large amount of history data.
> I find the performance in that case quite disappointing. More precisely
> throughput is quite low (which is not surprising for a system optimized for
> low latency).
>
> Is there any knob to turn to get a much higher throughput in such cases ?
>
> Thanks
> Mathieu
>