You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Peter Groesbeck <pe...@gmail.com> on 2019/08/30 20:00:25 UTC

Idempotent Producers and Exactly Once Consumers

For a producer that emits messages to a single topic (i.e. no single
message is sent to multiple topics), will enabling idempotency but not
transactions provide exactly once guarantees for downstream consumers of
said topic?

Ordering is not important I just want to make sure consumers only consumer
messages sent once.

Re: Idempotent Producers and Exactly Once Consumers

Posted by "Matthias J. Sax" <ma...@confluent.io>.
If you only enable "read_committed", yes.

If you set `processing.guarantees="exactly_once"` the guarantee is
stronger, because using a read-process-write pattern allows to couple
writing result data and committing offsets into a single transaction (in
contrast to a consumer-only scenario, for which there are no transactions).

Hence, in cause if failure a transaction is aborted and offsets are not
committed (and in case of success, output data and offset are committed
atomically). This implies, that in a failure scenario Kafka Streams will
still re-read records from the input topic (ie, there are still retries
to read and process data), however, the output data before the failure
was aborted and thus downstream application in read_committed mode won't
see them.

Note: your Kafka Streams application must be side-effect free for
"exactly_once" to work out-of-the-box. If there are side-effects, in
case of failure and retries, those side-effect would be applied multiple
times.


-Matthias

On 9/29/19 2:49 PM, christopher palm wrote:
> Does a Kafka Streams consumer also have that same limitation of possible
> duplicates?
> 
> Thanks,
> Chris
> 
> On Fri, Sep 27, 2019 at 11:56 AM Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Enabling "read_committed" only ensures that a consumer does not return
>> uncommitted data.
>>
>> However, on failure, a consumer might still read committed messages
>> multiple times (if you commit offsets after processing). If you commit
>> offsets before you process messages, and a failure happens before
>> processing finishes, you may "loose" those messages, as they won't be
>> consumed again on restart.
>>
>> Hence, if you have a "consumer only" application, not much changed and
>> you still need to take care in your application code about potential
>> duplicate processing of records.
>>
>> -Matthias
>>
>>
>> On 9/27/19 7:34 AM, Alessandro Tagliapietra wrote:
>>> You can achieve exactly once on a consumer by enabling read committed and
>>> manually committing the offset as soon as you receive a message. That way
>>> you know that at next poll you won't get old message again.
>>>
>>> On Fri, Sep 27, 2019, 6:24 AM christopher palm <cp...@gmail.com> wrote:
>>>
>>>> I had a similar question, and just watched the video on the
>> confluent.io
>>>> site about this.
>>>> From what I understand idempotence and transactions are there to solve
>> the
>>>> duplicate writes and exactly once processing, respectively.
>>>>
>>>> Is what you are stating below is that this only works if we produce
>> into a
>>>> kafka topic and consume from it via a kafka stream, but a regular
>>>> kafka consumer won't get the guarantee of exactly once processing?
>>>>
>>>> Thanks,
>>>> Chris
>>>>
>>>>
>>>> On Sat, Aug 31, 2019 at 12:29 AM Matthias J. Sax <matthias@confluent.io
>>>
>>>> wrote:
>>>>
>>>>> Exactly-once on the producer will only ensure that no duplicate writes
>>>>> happen. If a downstream consumer fails, you might still read message
>>>>> multiple times for all cases (ie, without idempotence, with idempotence
>>>>> enabled, or if you use transactions).
>>>>>
>>>>> Note, that exactly-once is designed for a read-process-write pattern,
>>>>> but not for a write-read pattern.
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>> On 8/30/19 1:00 PM, Peter Groesbeck wrote:
>>>>>> For a producer that emits messages to a single topic (i.e. no single
>>>>>> message is sent to multiple topics), will enabling idempotency but not
>>>>>> transactions provide exactly once guarantees for downstream consumers
>>>> of
>>>>>> said topic?
>>>>>>
>>>>>> Ordering is not important I just want to make sure consumers only
>>>>> consumer
>>>>>> messages sent once.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
> 


Re: Idempotent Producers and Exactly Once Consumers

Posted by christopher palm <cp...@gmail.com>.
Does a Kafka Streams consumer also have that same limitation of possible
duplicates?

Thanks,
Chris

On Fri, Sep 27, 2019 at 11:56 AM Matthias J. Sax <ma...@confluent.io>
wrote:

> Enabling "read_committed" only ensures that a consumer does not return
> uncommitted data.
>
> However, on failure, a consumer might still read committed messages
> multiple times (if you commit offsets after processing). If you commit
> offsets before you process messages, and a failure happens before
> processing finishes, you may "loose" those messages, as they won't be
> consumed again on restart.
>
> Hence, if you have a "consumer only" application, not much changed and
> you still need to take care in your application code about potential
> duplicate processing of records.
>
> -Matthias
>
>
> On 9/27/19 7:34 AM, Alessandro Tagliapietra wrote:
> > You can achieve exactly once on a consumer by enabling read committed and
> > manually committing the offset as soon as you receive a message. That way
> > you know that at next poll you won't get old message again.
> >
> > On Fri, Sep 27, 2019, 6:24 AM christopher palm <cp...@gmail.com> wrote:
> >
> >> I had a similar question, and just watched the video on the
> confluent.io
> >> site about this.
> >> From what I understand idempotence and transactions are there to solve
> the
> >> duplicate writes and exactly once processing, respectively.
> >>
> >> Is what you are stating below is that this only works if we produce
> into a
> >> kafka topic and consume from it via a kafka stream, but a regular
> >> kafka consumer won't get the guarantee of exactly once processing?
> >>
> >> Thanks,
> >> Chris
> >>
> >>
> >> On Sat, Aug 31, 2019 at 12:29 AM Matthias J. Sax <matthias@confluent.io
> >
> >> wrote:
> >>
> >>> Exactly-once on the producer will only ensure that no duplicate writes
> >>> happen. If a downstream consumer fails, you might still read message
> >>> multiple times for all cases (ie, without idempotence, with idempotence
> >>> enabled, or if you use transactions).
> >>>
> >>> Note, that exactly-once is designed for a read-process-write pattern,
> >>> but not for a write-read pattern.
> >>>
> >>> -Matthias
> >>>
> >>>
> >>>
> >>> On 8/30/19 1:00 PM, Peter Groesbeck wrote:
> >>>> For a producer that emits messages to a single topic (i.e. no single
> >>>> message is sent to multiple topics), will enabling idempotency but not
> >>>> transactions provide exactly once guarantees for downstream consumers
> >> of
> >>>> said topic?
> >>>>
> >>>> Ordering is not important I just want to make sure consumers only
> >>> consumer
> >>>> messages sent once.
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: Idempotent Producers and Exactly Once Consumers

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Enabling "read_committed" only ensures that a consumer does not return
uncommitted data.

However, on failure, a consumer might still read committed messages
multiple times (if you commit offsets after processing). If you commit
offsets before you process messages, and a failure happens before
processing finishes, you may "loose" those messages, as they won't be
consumed again on restart.

Hence, if you have a "consumer only" application, not much changed and
you still need to take care in your application code about potential
duplicate processing of records.

-Matthias


On 9/27/19 7:34 AM, Alessandro Tagliapietra wrote:
> You can achieve exactly once on a consumer by enabling read committed and
> manually committing the offset as soon as you receive a message. That way
> you know that at next poll you won't get old message again.
> 
> On Fri, Sep 27, 2019, 6:24 AM christopher palm <cp...@gmail.com> wrote:
> 
>> I had a similar question, and just watched the video on the confluent.io
>> site about this.
>> From what I understand idempotence and transactions are there to solve the
>> duplicate writes and exactly once processing, respectively.
>>
>> Is what you are stating below is that this only works if we produce into a
>> kafka topic and consume from it via a kafka stream, but a regular
>> kafka consumer won't get the guarantee of exactly once processing?
>>
>> Thanks,
>> Chris
>>
>>
>> On Sat, Aug 31, 2019 at 12:29 AM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> Exactly-once on the producer will only ensure that no duplicate writes
>>> happen. If a downstream consumer fails, you might still read message
>>> multiple times for all cases (ie, without idempotence, with idempotence
>>> enabled, or if you use transactions).
>>>
>>> Note, that exactly-once is designed for a read-process-write pattern,
>>> but not for a write-read pattern.
>>>
>>> -Matthias
>>>
>>>
>>>
>>> On 8/30/19 1:00 PM, Peter Groesbeck wrote:
>>>> For a producer that emits messages to a single topic (i.e. no single
>>>> message is sent to multiple topics), will enabling idempotency but not
>>>> transactions provide exactly once guarantees for downstream consumers
>> of
>>>> said topic?
>>>>
>>>> Ordering is not important I just want to make sure consumers only
>>> consumer
>>>> messages sent once.
>>>>
>>>
>>>
>>
> 


Re: Idempotent Producers and Exactly Once Consumers

Posted by Alessandro Tagliapietra <ta...@gmail.com>.
You can achieve exactly once on a consumer by enabling read committed and
manually committing the offset as soon as you receive a message. That way
you know that at next poll you won't get old message again.

On Fri, Sep 27, 2019, 6:24 AM christopher palm <cp...@gmail.com> wrote:

> I had a similar question, and just watched the video on the confluent.io
> site about this.
> From what I understand idempotence and transactions are there to solve the
> duplicate writes and exactly once processing, respectively.
>
> Is what you are stating below is that this only works if we produce into a
> kafka topic and consume from it via a kafka stream, but a regular
> kafka consumer won't get the guarantee of exactly once processing?
>
> Thanks,
> Chris
>
>
> On Sat, Aug 31, 2019 at 12:29 AM Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Exactly-once on the producer will only ensure that no duplicate writes
> > happen. If a downstream consumer fails, you might still read message
> > multiple times for all cases (ie, without idempotence, with idempotence
> > enabled, or if you use transactions).
> >
> > Note, that exactly-once is designed for a read-process-write pattern,
> > but not for a write-read pattern.
> >
> > -Matthias
> >
> >
> >
> > On 8/30/19 1:00 PM, Peter Groesbeck wrote:
> > > For a producer that emits messages to a single topic (i.e. no single
> > > message is sent to multiple topics), will enabling idempotency but not
> > > transactions provide exactly once guarantees for downstream consumers
> of
> > > said topic?
> > >
> > > Ordering is not important I just want to make sure consumers only
> > consumer
> > > messages sent once.
> > >
> >
> >
>

Re: Idempotent Producers and Exactly Once Consumers

Posted by christopher palm <cp...@gmail.com>.
I had a similar question, and just watched the video on the confluent.io
site about this.
From what I understand idempotence and transactions are there to solve the
duplicate writes and exactly once processing, respectively.

Is what you are stating below is that this only works if we produce into a
kafka topic and consume from it via a kafka stream, but a regular
kafka consumer won't get the guarantee of exactly once processing?

Thanks,
Chris


On Sat, Aug 31, 2019 at 12:29 AM Matthias J. Sax <ma...@confluent.io>
wrote:

> Exactly-once on the producer will only ensure that no duplicate writes
> happen. If a downstream consumer fails, you might still read message
> multiple times for all cases (ie, without idempotence, with idempotence
> enabled, or if you use transactions).
>
> Note, that exactly-once is designed for a read-process-write pattern,
> but not for a write-read pattern.
>
> -Matthias
>
>
>
> On 8/30/19 1:00 PM, Peter Groesbeck wrote:
> > For a producer that emits messages to a single topic (i.e. no single
> > message is sent to multiple topics), will enabling idempotency but not
> > transactions provide exactly once guarantees for downstream consumers of
> > said topic?
> >
> > Ordering is not important I just want to make sure consumers only
> consumer
> > messages sent once.
> >
>
>

Re: Idempotent Producers and Exactly Once Consumers

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Exactly-once on the producer will only ensure that no duplicate writes
happen. If a downstream consumer fails, you might still read message
multiple times for all cases (ie, without idempotence, with idempotence
enabled, or if you use transactions).

Note, that exactly-once is designed for a read-process-write pattern,
but not for a write-read pattern.

-Matthias



On 8/30/19 1:00 PM, Peter Groesbeck wrote:
> For a producer that emits messages to a single topic (i.e. no single
> message is sent to multiple topics), will enabling idempotency but not
> transactions provide exactly once guarantees for downstream consumers of
> said topic?
> 
> Ordering is not important I just want to make sure consumers only consumer
> messages sent once.
> 


Re: Idempotent Producers and Exactly Once Consumers

Posted by Jörn Franke <jo...@gmail.com>.
This is not only a configuration. Like in all messaging solutions you need to make sure that all involved applications ensure once and only once delivery from producer(s) over Kafka to consumer(s).


> Am 30.08.2019 um 22:00 schrieb Peter Groesbeck <pe...@gmail.com>:
> 
> For a producer that emits messages to a single topic (i.e. no single
> message is sent to multiple topics), will enabling idempotency but not
> transactions provide exactly once guarantees for downstream consumers of
> said topic?
> 
> Ordering is not important I just want to make sure consumers only consumer
> messages sent once.