You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Vitor Augusto de Medeiros <v....@aluno.ufabc.edu.br> on 2021/10/14 19:47:02 UTC

RE: [SPAM] Re: Why does Kafka have a higher throughput than Redis?

Thanks for the response, Andrew, i appreciate the help!

Just i few thoughts that came up while reading your points:


  1.  In theory, Redis is also handling/storing data in memory which makes me wonder why is that Kafka does it better? Perhaps it has to do with the API contract, where, as you said, there's no complex transactional software that might hurt performance.
  2.  Didn't know there was such a big difference from linear to random writes, pretty awesome! But I still don't understand how disk usage, even If doing linear writes, is still allowing a throughput rate of 2 to 3x the amount of Redis, which doesn't use disk write/read at all and keep messages stored in memory.
  3.  Didn't know about this zero-copy technique, I'll read more about that but feels like the result would be a response similar to as if kafka had the info stored in-memory (as redis do) but that would still make me question how is that Kafka can handle a higher throughput if the "design" is so similar.


________________________________
De: Andrew Grant <an...@gmail.com>
Enviado: quinta-feira, 14 de outubro de 2021 15:55
Para: dev@kafka.apache.org <de...@kafka.apache.org>
Assunto: [SPAM] Re: Why does Kafka have a higher throughput than Redis?

Hi Vitor,

I'm not an expert and probably some more knowledgeable folks can also chime
in (and correct me) but a few things came to mind:

1) On the write side (i.e. when using the publisher), Kafka does not flush
data to disk by default. It writes to the page cache so all writes are sort
of in-memory in a way. They're staged in the page cache and the kernel
flushes the data asynchronously. Also the API contract for Kafka is quite
"simple" in that it mostly reads and writes arbitrary sequences of bytes -
there isn't as much complex transactional software in front of the
writing/reading that might hurt performance compared to some other data
stores. Note, Kafka does provide things like idempotence and transactions
so it's not like there is never any overhead to consider.

2) Kafka reads and writes are conducive to being linear which helps a lot
with performance. Random writes are a lot slower than linear ones.

3) For reading (i.e. when using the consumer) data Kafka uses a zero-copy
technique in which data is directly sent from the page cache to the network
buffer without going through user space which helps a lot.

4) Kafka batches aggressively.

Here are two resources which might provide more information
https://docs.confluent.io/platform/current/kafka/design.html,
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.

Hope this helps a bit.

Andrew

On Thu, Oct 14, 2021 at 1:11 PM Vitor Augusto de Medeiros <
v.medeiros@aluno.ufabc.edu.br> wrote:

> Hi everyone,
>
>  i'm doing a benchmark comparison between Kafka and Redis for my final
> bachelor paper and would like to understand more about why Kafka have
> higher throughput if compared to Redis.
>
>  I noticed Redis has lower overall latency (and makes sense since it's
> stored in memory) but cant figure out the difference in throughput.
>
> I found a study (not sure if i can post links here but it's named A
> COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING
> PIPELINES by Sebastian Tallberg)
> showing Kafka's throughput hitting 3x the amount of msg/s if compared to
> Redis for a 1kB payload. I would like to understand what is in Kafka's
> architecture that allows it to be a lot faster than other message
> brokers/Redis in particular
>
> Thanks!
>


--
Andrew Grant
8054482621

Re: [SPAM] Re: Why does Kafka have a higher throughput than Redis?

Posted by Luke Chen <sh...@gmail.com>.
Hi Vitor,
I'm not the expert, either, but I think Andrew's answer is pretty much the
reasons why Kafka is doing good.
And I'm not too familiar with Redis, either. But I'd say, there are many
configurations in each product to increase the throughput, and the use
cases are different, the comparison might not be fair.

For your question:
3.  Didn't know about this zero-copy technique, I'll read more about that
but feels like the result would be a response similar to as if kafka had
the info stored in-memory (as redis do) but that would still make me
question how is that Kafka can handle a higher throughput if the "design"
is so similar.
--> Again, I'm not familiar with Redis, but even you store data in memory,
if there's no OS's help, you still need to copy data to kernel space to
send to the receiver, compared with the zero-copy technique, all data flow
are within kernel space.

But again, the use cases are different, the comparison might not be fair.
We can only analyze and learn why and how they have good throughput.
That's my two cents.

Thank you.
Luke

On Fri, Oct 15, 2021 at 3:47 AM Vitor Augusto de Medeiros <
v.medeiros@aluno.ufabc.edu.br> wrote:

> Thanks for the response, Andrew, i appreciate the help!
>
> Just i few thoughts that came up while reading your points:
>
>
>   1.  In theory, Redis is also handling/storing data in memory which makes
> me wonder why is that Kafka does it better? Perhaps it has to do with the
> API contract, where, as you said, there's no complex transactional software
> that might hurt performance.
>   2.  Didn't know there was such a big difference from linear to random
> writes, pretty awesome! But I still don't understand how disk usage, even
> If doing linear writes, is still allowing a throughput rate of 2 to 3x the
> amount of Redis, which doesn't use disk write/read at all and keep messages
> stored in memory.
>   3.  Didn't know about this zero-copy technique, I'll read more about
> that but feels like the result would be a response similar to as if kafka
> had the info stored in-memory (as redis do) but that would still make me
> question how is that Kafka can handle a higher throughput if the "design"
> is so similar.
>
>
> ________________________________
> De: Andrew Grant <an...@gmail.com>
> Enviado: quinta-feira, 14 de outubro de 2021 15:55
> Para: dev@kafka.apache.org <de...@kafka.apache.org>
> Assunto: [SPAM] Re: Why does Kafka have a higher throughput than Redis?
>
> Hi Vitor,
>
> I'm not an expert and probably some more knowledgeable folks can also chime
> in (and correct me) but a few things came to mind:
>
> 1) On the write side (i.e. when using the publisher), Kafka does not flush
> data to disk by default. It writes to the page cache so all writes are sort
> of in-memory in a way. They're staged in the page cache and the kernel
> flushes the data asynchronously. Also the API contract for Kafka is quite
> "simple" in that it mostly reads and writes arbitrary sequences of bytes -
> there isn't as much complex transactional software in front of the
> writing/reading that might hurt performance compared to some other data
> stores. Note, Kafka does provide things like idempotence and transactions
> so it's not like there is never any overhead to consider.
>
> 2) Kafka reads and writes are conducive to being linear which helps a lot
> with performance. Random writes are a lot slower than linear ones.
>
> 3) For reading (i.e. when using the consumer) data Kafka uses a zero-copy
> technique in which data is directly sent from the page cache to the network
> buffer without going through user space which helps a lot.
>
> 4) Kafka batches aggressively.
>
> Here are two resources which might provide more information
> https://docs.confluent.io/platform/current/kafka/design.html,
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> .
>
> Hope this helps a bit.
>
> Andrew
>
> On Thu, Oct 14, 2021 at 1:11 PM Vitor Augusto de Medeiros <
> v.medeiros@aluno.ufabc.edu.br> wrote:
>
> > Hi everyone,
> >
> >  i'm doing a benchmark comparison between Kafka and Redis for my final
> > bachelor paper and would like to understand more about why Kafka have
> > higher throughput if compared to Redis.
> >
> >  I noticed Redis has lower overall latency (and makes sense since it's
> > stored in memory) but cant figure out the difference in throughput.
> >
> > I found a study (not sure if i can post links here but it's named A
> > COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING
> > PIPELINES by Sebastian Tallberg)
> > showing Kafka's throughput hitting 3x the amount of msg/s if compared to
> > Redis for a 1kB payload. I would like to understand what is in Kafka's
> > architecture that allows it to be a lot faster than other message
> > brokers/Redis in particular
> >
> > Thanks!
> >
>
>
> --
> Andrew Grant
> 8054482621
>