You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Andrew Schofield <an...@live.com> on 2016/02/19 22:49:04 UTC

Exactly-once publication behaviour

When publishing messages to Kafka, you make a choice between at-most-once and at-least-once delivery, depending on whether you wait for acknowledgments and whether you retry on failures. In most cases, those options are good enough. However, some systems offer exactly-once reliability too. Although my view is that the practical use of exactly-once is limited in the situations that Kafka is generally used for, when you're connecting other systems to Kafka or bridging between protocols, I think there is value in propagating the reliability level that the other system expects.

As a consumer, you can manage your offset and get exactly-once delivery, or more likely exactly-once processing, of the messages.

I've read about idempotent producers (https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer) and I know there's been some discussion about transactions too.

Is there a plan to provide the tools to enable exactly-once publication behaviour? Is this a planned enhancement to Kafka Connect? Is there already some technique that people are using effectively to get exactly-once?

Andrew Schofield 		 	   		  

Re: Exactly-once publication behaviour

Posted by Adam Kunicki <ad...@streamsets.com>.
Andrew,

In SDC (https://github.com/streamsets/datacollector
<https://mailtrack.io/trace/link/0092e25372b4cf98c5d52857aab6990eba67700c?url=https%3A%2F%2Fgithub.com%2Fstreamsets%2Fdatacollector&signature=449d63b767116a1a>)
we do the kind of offset management you mention to achieve this type of
behavior (ideally--exactly once processing) but we still only give the user
the choice of "at least once" and "at most once" because even when handling
offsets this way you can still have an application failure and have a (very
small) possibility of a duplicate if the offset wasnt committed due to, as
an example, some transient error.

Specifically you can check out
https://github.com/streamsets/datacollector/blob/9828e4ba5b90614316506c95784f43c471edc222/sdc-kafka_0_9/src/main/java/com/streamsets/pipeline/kafka/impl/KafkaConsumer09.java#L142-L173

We explicitly commit the offsets only once they've completed processing
through the rest of the data pipeline.

Hope this helps!

-Adam

On Fri, Feb 19, 2016 at 1:49 PM, Andrew Schofield <andrew_schofield@live.com
> wrote:

> When publishing messages to Kafka, you make a choice between at-most-once
> and at-least-once delivery, depending on whether you wait for
> acknowledgments and whether you retry on failures. In most cases, those
> options are good enough. However, some systems offer exactly-once
> reliability too. Although my view is that the practical use of exactly-once
> is limited in the situations that Kafka is generally used for, when you're
> connecting other systems to Kafka or bridging between protocols, I think
> there is value in propagating the reliability level that the other system
> expects.
>
> As a consumer, you can manage your offset and get exactly-once delivery,
> or more likely exactly-once processing, of the messages.
>
> I've read about idempotent producers (
> https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer)
> and I know there's been some discussion about transactions too.
>
> Is there a plan to provide the tools to enable exactly-once publication
> behaviour? Is this a planned enhancement to Kafka Connect? Is there already
> some technique that people are using effectively to get exactly-once?
>
> Andrew Schofield




-- 
Adam Kunicki
StreamSets | Field Engineer
mobile: 415.890.DATA (3282) | linkedin
<https://mailtrack.io/trace/link/1a10fd5d6ef1b52ce525279a1b43102d913f7de5?url=http%3A%2F%2Fwww.adamkunicki.com&signature=d61f8b48a0c4f804>