You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Karthick Kumar <kk...@apptivo.co.in> on 2018/05/28 07:17:51 UTC

Facing Duplication Issue in kakfa

Hi,

Facing Duplication inconsistently while bouncing Kafka producer and
consumer in tomcat node. any help will be appreciated to find out the root
cause.

-- 
With Regards,
Karthick.K

Re: Facing Duplication Issue in kakfa

Posted by "M. Manna" <ma...@gmail.com>.
This is a good article on LinkedIn site - I think it's a good item to read
before hitting complicated designs

https://www.linkedin.com/pulse/exactly-once-delivery-message-distributed-system-arun-dhwaj/


On 29 May 2018 at 14:34, Thakrar, Jayesh <jt...@conversantmedia.com>
wrote:

> For more details, see https://www.slideshare.net/
> JayeshThakrar/kafka-68540012
>
> While this is based on Kafka 0.9, the fundamental concepts and reasons are
> still valid.
>
>
> On 5/28/18, 12:20 PM, "Hans Jespersen" <ha...@confluent.io> wrote:
>
>     Are you seeing 1) duplicate messages stored in a Kafka topic partition
> or 2) duplicate consumption and processing of a single message stored in a
> Kafka topic?
>
>     If it’s #1 then you can turn on the idempotent producer feature to get
> Exactly Once Semantics (EOS) while publishing.
>
>     If it’s #2 then you can examine more closely how your consumer is
> doing offset commits.
>     If you are committing offsets automatically by time then there is
> always a possibility that the last time window of messages your consumer
> did not yet commit will be received again when the consumer restarts.
>
>     You can instead manually commit, possibly even after each message
> which will shrink the window of possible duplicate messages to 1, but at
> the cost of some performance.
>
>     What many of the Kafka Sink Connectors do for exactly once processing
> is to store their offsets atomically with the data they write external to
> Kafka. For example a database connector would write the message data and
> the offsets to a database in one atomic write operation. Upon restart of
> the app it then rereads the offset from the database and resumes
> consumption from Kafka from the last offset point using seek() to
> reposition the Kafka offset for the consumer before the first call to poll()
>
>     These are the techniques most people use to get end to end exactly
> once processing with no duplicates even in the event of a failure.
>
>
>     -hans
>
>     > On May 28, 2018, at 12:17 AM, Karthick Kumar <kk...@apptivo.co.in>
> wrote:
>     >
>     > Hi,
>     >
>     > Facing Duplication inconsistently while bouncing Kafka producer and
>     > consumer in tomcat node. any help will be appreciated to find out
> the root
>     > cause.
>     >
>     > --
>     > With Regards,
>     > Karthick.K
>
>
>
>

Re: Facing Duplication Issue in kakfa

Posted by "Thakrar, Jayesh" <jt...@conversantmedia.com>.
For more details, see https://www.slideshare.net/JayeshThakrar/kafka-68540012

While this is based on Kafka 0.9, the fundamental concepts and reasons are still valid.


On 5/28/18, 12:20 PM, "Hans Jespersen" <ha...@confluent.io> wrote:

    Are you seeing 1) duplicate messages stored in a Kafka topic partition or 2) duplicate consumption and processing of a single message stored in a Kafka topic?
    
    If it’s #1 then you can turn on the idempotent producer feature to get Exactly Once Semantics (EOS) while publishing.
    
    If it’s #2 then you can examine more closely how your consumer is doing offset commits.
    If you are committing offsets automatically by time then there is always a possibility that the last time window of messages your consumer did not yet commit will be received again when the consumer restarts. 
    
    You can instead manually commit, possibly even after each message which will shrink the window of possible duplicate messages to 1, but at the cost of some performance. 
    
    What many of the Kafka Sink Connectors do for exactly once processing is to store their offsets atomically with the data they write external to Kafka. For example a database connector would write the message data and the offsets to a database in one atomic write operation. Upon restart of the app it then rereads the offset from the database and resumes consumption from Kafka from the last offset point using seek() to reposition the Kafka offset for the consumer before the first call to poll()
    
    These are the techniques most people use to get end to end exactly once processing with no duplicates even in the event of a failure.
    
    
    -hans
    
    > On May 28, 2018, at 12:17 AM, Karthick Kumar <kk...@apptivo.co.in> wrote:
    > 
    > Hi,
    > 
    > Facing Duplication inconsistently while bouncing Kafka producer and
    > consumer in tomcat node. any help will be appreciated to find out the root
    > cause.
    > 
    > -- 
    > With Regards,
    > Karthick.K
    
    


Re: Facing Duplication Issue in kakfa

Posted by Hans Jespersen <ha...@confluent.io>.
Are you seeing 1) duplicate messages stored in a Kafka topic partition or 2) duplicate consumption and processing of a single message stored in a Kafka topic?

If it’s #1 then you can turn on the idempotent producer feature to get Exactly Once Semantics (EOS) while publishing.

If it’s #2 then you can examine more closely how your consumer is doing offset commits.
If you are committing offsets automatically by time then there is always a possibility that the last time window of messages your consumer did not yet commit will be received again when the consumer restarts. 

You can instead manually commit, possibly even after each message which will shrink the window of possible duplicate messages to 1, but at the cost of some performance. 

What many of the Kafka Sink Connectors do for exactly once processing is to store their offsets atomically with the data they write external to Kafka. For example a database connector would write the message data and the offsets to a database in one atomic write operation. Upon restart of the app it then rereads the offset from the database and resumes consumption from Kafka from the last offset point using seek() to reposition the Kafka offset for the consumer before the first call to poll()

These are the techniques most people use to get end to end exactly once processing with no duplicates even in the event of a failure.


-hans

> On May 28, 2018, at 12:17 AM, Karthick Kumar <kk...@apptivo.co.in> wrote:
> 
> Hi,
> 
> Facing Duplication inconsistently while bouncing Kafka producer and
> consumer in tomcat node. any help will be appreciated to find out the root
> cause.
> 
> -- 
> With Regards,
> Karthick.K

Re: Facing Duplication Issue in kakfa

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Duplication can happen if your producer or consumer are exiting uncleanly.
Like if producer just crashes before it receives ack from broker your logic
will fail to register that message got produced. And when it comes back up
it will try to send that batch again. Same with consumer, if it crashes
before committing a batch of messages and comes back up it will receive
that batch. Only help is to try exiting cleanly as far as possible. Ensure
you catch kill signal then have your producer mark messages delivered to
Kafka broker as processed. In case of consumer commit more often, catch
kill signal and commit remaining. That's how I have done in my application.

On Mon, May 28, 2018 at 12:48 PM Karthick Kumar <kk...@apptivo.co.in>
wrote:

> Hi,
>
> Facing Duplication inconsistently while bouncing Kafka producer and
> consumer in tomcat node. any help will be appreciated to find out the root
> cause.
>
> --
> With Regards,
> Karthick.K
>