You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ryan Slade <ry...@avocet.io> on 2016/11/18 13:49:02 UTC

Kafka Streaming message loss

Hi

I'm trialling Kafka Streaming for a large stream processing job, however
I'm seeing message loss even in the simplest scenarios.

I've tried to boil it down to the simplest scenario where I see loss which
is the following:
1. Ingest messages from an input stream (String, String)
2. Decode message into a type from JSON
3. If succesful, send to a second stream and update an atomic counter.
(String, CustomType)
4. A foreach on the second stream that updates an AtomicCounter each time a
message arrives.

I would expect that since we have at least once guarantees that the second
stream would see at least as many messages as were sent to it from the
first, however I consistently see message loss.

I've tested multiple times sending around 200k messages. I don't see losses
every time, maybe around 1 in 5 runs with the same data. The losses are
small, around 100 messages, but I would expect none.

I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream Consumer
all running on the same machine in order to mitigate packet loss.

I'm running Ubuntu 16.04 with OpenJDK.

Any advice would be greatly appreciated as I can't move forward with Kafka
Streams as a solution if messages are consistently lost between stream on
the same machine.

Thanks
Ryan

Re: Kafka Streaming message loss

Posted by Michael Noll <mi...@confluent.io>.
Also:  Since your testing is purely local, feel free to share the code you
have been using so that we can try to reproduce what you're observing.

-Michael



On Mon, Nov 21, 2016 at 4:04 PM, Michael Noll <mi...@confluent.io> wrote:

> Please don't take this comment the wrong way, but have you double-checked
> whether your counting code is working correctly?  (I'm not implying this
> could be the only reason for what you're observing.)
>
> -Michael
>
>
> On Fri, Nov 18, 2016 at 4:52 PM, Eno Thereska <en...@gmail.com>
> wrote:
>
>> Hi Ryan,
>>
>> Perhaps you could share some of your code so we can have a look? One
>> thing I'd check is if you are using compacted Kafka topics. If so, and if
>> you have non-unique keys, compaction happens automatically and you might
>> only see the latest value for a key.
>>
>> Thanks
>> Eno
>> > On 18 Nov 2016, at 13:49, Ryan Slade <ry...@avocet.io> wrote:
>> >
>> > Hi
>> >
>> > I'm trialling Kafka Streaming for a large stream processing job, however
>> > I'm seeing message loss even in the simplest scenarios.
>> >
>> > I've tried to boil it down to the simplest scenario where I see loss
>> which
>> > is the following:
>> > 1. Ingest messages from an input stream (String, String)
>> > 2. Decode message into a type from JSON
>> > 3. If succesful, send to a second stream and update an atomic counter.
>> > (String, CustomType)
>> > 4. A foreach on the second stream that updates an AtomicCounter each
>> time a
>> > message arrives.
>> >
>> > I would expect that since we have at least once guarantees that the
>> second
>> > stream would see at least as many messages as were sent to it from the
>> > first, however I consistently see message loss.
>> >
>> > I've tested multiple times sending around 200k messages. I don't see
>> losses
>> > every time, maybe around 1 in 5 runs with the same data. The losses are
>> > small, around 100 messages, but I would expect none.
>> >
>> > I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream
>> Consumer
>> > all running on the same machine in order to mitigate packet loss.
>> >
>> > I'm running Ubuntu 16.04 with OpenJDK.
>> >
>> > Any advice would be greatly appreciated as I can't move forward with
>> Kafka
>> > Streams as a solution if messages are consistently lost between stream
>> on
>> > the same machine.
>> >
>> > Thanks
>> > Ryan
>>
>>
>

Re: Kafka Streaming message loss

Posted by Michael Noll <mi...@confluent.io>.
Please don't take this comment the wrong way, but have you double-checked
whether your counting code is working correctly?  (I'm not implying this
could be the only reason for what you're observing.)

-Michael


On Fri, Nov 18, 2016 at 4:52 PM, Eno Thereska <en...@gmail.com>
wrote:

> Hi Ryan,
>
> Perhaps you could share some of your code so we can have a look? One thing
> I'd check is if you are using compacted Kafka topics. If so, and if you
> have non-unique keys, compaction happens automatically and you might only
> see the latest value for a key.
>
> Thanks
> Eno
> > On 18 Nov 2016, at 13:49, Ryan Slade <ry...@avocet.io> wrote:
> >
> > Hi
> >
> > I'm trialling Kafka Streaming for a large stream processing job, however
> > I'm seeing message loss even in the simplest scenarios.
> >
> > I've tried to boil it down to the simplest scenario where I see loss
> which
> > is the following:
> > 1. Ingest messages from an input stream (String, String)
> > 2. Decode message into a type from JSON
> > 3. If succesful, send to a second stream and update an atomic counter.
> > (String, CustomType)
> > 4. A foreach on the second stream that updates an AtomicCounter each
> time a
> > message arrives.
> >
> > I would expect that since we have at least once guarantees that the
> second
> > stream would see at least as many messages as were sent to it from the
> > first, however I consistently see message loss.
> >
> > I've tested multiple times sending around 200k messages. I don't see
> losses
> > every time, maybe around 1 in 5 runs with the same data. The losses are
> > small, around 100 messages, but I would expect none.
> >
> > I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream
> Consumer
> > all running on the same machine in order to mitigate packet loss.
> >
> > I'm running Ubuntu 16.04 with OpenJDK.
> >
> > Any advice would be greatly appreciated as I can't move forward with
> Kafka
> > Streams as a solution if messages are consistently lost between stream on
> > the same machine.
> >
> > Thanks
> > Ryan
>
>

Re: Kafka Streaming message loss

Posted by Eno Thereska <en...@gmail.com>.
Hi Ryan,

Perhaps you could share some of your code so we can have a look? One thing I'd check is if you are using compacted Kafka topics. If so, and if you have non-unique keys, compaction happens automatically and you might only see the latest value for a key.

Thanks
Eno
> On 18 Nov 2016, at 13:49, Ryan Slade <ry...@avocet.io> wrote:
> 
> Hi
> 
> I'm trialling Kafka Streaming for a large stream processing job, however
> I'm seeing message loss even in the simplest scenarios.
> 
> I've tried to boil it down to the simplest scenario where I see loss which
> is the following:
> 1. Ingest messages from an input stream (String, String)
> 2. Decode message into a type from JSON
> 3. If succesful, send to a second stream and update an atomic counter.
> (String, CustomType)
> 4. A foreach on the second stream that updates an AtomicCounter each time a
> message arrives.
> 
> I would expect that since we have at least once guarantees that the second
> stream would see at least as many messages as were sent to it from the
> first, however I consistently see message loss.
> 
> I've tested multiple times sending around 200k messages. I don't see losses
> every time, maybe around 1 in 5 runs with the same data. The losses are
> small, around 100 messages, but I would expect none.
> 
> I'm running version 0.10.1.0 with Zookeeper, Kafka and the Stream Consumer
> all running on the same machine in order to mitigate packet loss.
> 
> I'm running Ubuntu 16.04 with OpenJDK.
> 
> Any advice would be greatly appreciated as I can't move forward with Kafka
> Streams as a solution if messages are consistently lost between stream on
> the same machine.
> 
> Thanks
> Ryan