You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Christian Schuhegger <Ch...@gmx.de> on 2014/02/04 17:52:41 UTC

Kafka and no guarantee that every published message is actually received by the broker

Hello all,

I was reading in the following paper:

http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

the following paragraph:

-- snip start --
There are a few reasons why Kafka performed much better. First,
the Kafka producer currently doesn’t wait for acknowledgements
from the broker and sends messages as faster as the broker can
handle. This significantly increased the throughput of the
publisher. With a batch size of 50, a single Kafka producer almost
saturated the 1Gb link between the producer and the broker. This
is a valid optimization for the log aggregation case, as data must
be sent asynchronously to avoid introducing any latency into the
live serving of traffic. We note that without acknowledging the
producer, there is no guarantee that every published message is
actually received by the broker. For many types of log data, it is
desirable to trade durability for throughput, as long as the number
of dropped messages is relatively small. However, we do plan to
address the durability issue for more critical data in the future.
-- snip end --

And I was wondering if this is still true or if the plans for the future 
as described above to address the durability issue for more critical 
data were realized?

Many thanks,
-- 
Christian Schuhegger

Re: Kafka and no guarantee that every published message is actually received by the broker

Posted by Neha Narkhede <ne...@gmail.com>.

We have added intra cluster replication to address the durability issue in
Kafka 0.8. You can read the latest on the design and guarantees here -
http://kafka.apache.org/documentation.html#semantics

Thanks
Neha


On Tue, Feb 4, 2014 at 8:52 AM, Christian Schuhegger <
Christian.Schuhegger@gmx.de> wrote:

> Hello all,
>
> I was reading in the following paper:
>
> http://research.microsoft.com/en-us/um/people/srikanth/
> netdb11/netdb11papers/netdb11-final12.pdf
>
> the following paragraph:
>
> -- snip start --
> There are a few reasons why Kafka performed much better. First,
> the Kafka producer currently doesn't wait for acknowledgements
> from the broker and sends messages as faster as the broker can
> handle. This significantly increased the throughput of the
> publisher. With a batch size of 50, a single Kafka producer almost
> saturated the 1Gb link between the producer and the broker. This
> is a valid optimization for the log aggregation case, as data must
> be sent asynchronously to avoid introducing any latency into the
> live serving of traffic. We note that without acknowledging the
> producer, there is no guarantee that every published message is
> actually received by the broker. For many types of log data, it is
> desirable to trade durability for throughput, as long as the number
> of dropped messages is relatively small. However, we do plan to
> address the durability issue for more critical data in the future.
> -- snip end --
>
> And I was wondering if this is still true or if the plans for the future
> as described above to address the durability issue for more critical data
> were realized?
>
> Many thanks,
> --
> Christian Schuhegger
>
>
>