You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Guy Doulberg <Gu...@perion.com> on 2014/08/01 10:28:04 UTC

Consume more than produce

Hey,


After a year or so I have Kafka as my streaming layer in my production, I decided it is time to audit, and to test how many events do I lose, if I lose events at all.


I discovered something interesting which I can't explain.


The producer produces less events that the consumer group consumes.


It is not much more, it is about 0.1% more events


I use the Consumer API (not the simple consumer API)


I was thinking I might had rebalancing going on in my system, but it doesn't look like that.


Did anyone see such a behaviour


In order to audit, I calculated for each event the minute it arrived, and assigned this value to the event, I used statsd do to count all events from all my producer cluster, and all consumer group cluster.


I must say that it is not a happening for every minute,


Thanks, Guy



Re: Consume more than produce

Posted by Steve Morin <st...@stevemorin.com>.
You have to remember statsd uses udp and possibly lossy which might account
for the errors.
-Steve


On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>
wrote:

> Hey,
>
>
> After a year or so I have Kafka as my streaming layer in my production, I
> decided it is time to audit, and to test how many events do I lose, if I
> lose events at all.
>
>
> I discovered something interesting which I can't explain.
>
>
> The producer produces less events that the consumer group consumes.
>
>
> It is not much more, it is about 0.1% more events
>
>
> I use the Consumer API (not the simple consumer API)
>
>
> I was thinking I might had rebalancing going on in my system, but it
> doesn't look like that.
>
>
> Did anyone see such a behaviour
>
>
> In order to audit, I calculated for each event the minute it arrived, and
> assigned this value to the event, I used statsd do to count all events from
> all my producer cluster, and all consumer group cluster.
>
>
> I must say that it is not a happening for every minute,
>
>
> Thanks, Guy
>
>
>

Re: Consume more than produce

Posted by Jun Rao <ju...@gmail.com>.
Do you have producer retries (due to broker failure) in those minutes when
you see a diff?

Thanks,

Jun


On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>
wrote:

> Hey,
>
>
> After a year or so I have Kafka as my streaming layer in my production, I
> decided it is time to audit, and to test how many events do I lose, if I
> lose events at all.
>
>
> I discovered something interesting which I can't explain.
>
>
> The producer produces less events that the consumer group consumes.
>
>
> It is not much more, it is about 0.1% more events
>
>
> I use the Consumer API (not the simple consumer API)
>
>
> I was thinking I might had rebalancing going on in my system, but it
> doesn't look like that.
>
>
> Did anyone see such a behaviour
>
>
> In order to audit, I calculated for each event the minute it arrived, and
> assigned this value to the event, I used statsd do to count all events from
> all my producer cluster, and all consumer group cluster.
>
>
> I must say that it is not a happening for every minute,
>
>
> Thanks, Guy
>
>
>

RE: Consume more than produce

Posted by Guy Doulberg <Gu...@perion.com>.
If you can't see the image, I uploaded it to dropbox
https://www.dropbox.com/s/gckn4gt7gv26l9w/graph.png



From: Guy Doulberg [mailto:Guy.Doulberg@perion.com]
Sent: Monday, August 11, 2014 4:58 PM
To: users@kafka.apache.org
Subject: RE: Consume more than produce


Hey



I had an issue in the production two days ago,



For some reason 2 brokers in my 5 brokers cluster were stuck, meaning their process was up, but they didn't answer to port 9092. The ZK saw them as live brokers.



Producer couldn't produce events to them and consumer couldn't consume



Solving the issue I restarted the brokers, and also I stopped all the consumers I had, and started them again.



Before I try to understand what happened to the brokers, I want to show you what happened to the data in my system.



[cid:image001.png@01CFB585.69F27950]



The orange line is the number of events the consumer consumed

The blue line is the number of events the producer produced.



When there is the differences it's the time the brokers did their troubles.



I guess the problem was the consumer didn't commit the msgs, or something like that, but if that is so, why did the consumer continued consuming, might that help me understand the problem I have in general that even when times are regular the consumer consumes more the producer?



Thanks.



-----Original Message-----
From: Guy Doulberg [mailto:Guy.Doulberg@perion.com]
Sent: Monday, August 04, 2014 2:12 PM
To: users@kafka.apache.org<ma...@kafka.apache.org>
Subject: RE: Consume more than produce



Hi Daniel



I count once when producing and count once when consuming, the timestamp is calculated once before producing, and it is being attached to the msg so the consumer will use the same TS to count



Thanks



-----Original Message-----

From: Daniel Compton [mailto:desk@danielcompton.net]

Sent: Monday, August 04, 2014 12:35 PM

To: users@kafka.apache.org<ma...@kafka.apache.org>

Subject: Re: Consume more than produce



Hi Guy



In your reconciliation, where was the time stamp coming from? Is it possible that messages were delivered several times but your calculations only counted each unique event?



Daniel.



> On 4/08/2014, at 5:35 pm, Guy Doulberg <Gu...@perion.com>> wrote:

>

> Hi

>

> What do you mean producer ACK value?

>

> In my code I don't have a retry mechanism, the Kafka producer API has a retry mechanism?

>

>

> -----Original Message-----

> From: Guozhang Wang [mailto:wangguoz@gmail.com]

> Sent: Friday, August 01, 2014 6:08 PM

> To: users@kafka.apache.org<ma...@kafka.apache.org>

> Subject: Re: Consume more than produce

>

> What is the ack value used in the producer?

>

>

> On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>>

> wrote:

>

>> Hey,

>>

>>

>> After a year or so I have Kafka as my streaming layer in my

>> production, I decided it is time to audit, and to test how many

>> events do I lose, if I lose events at all.

>>

>>

>> I discovered something interesting which I can't explain.

>>

>>

>> The producer produces less events that the consumer group consumes.

>>

>>

>> It is not much more, it is about 0.1% more events

>>

>>

>> I use the Consumer API (not the simple consumer API)

>>

>>

>> I was thinking I might had rebalancing going on in my system, but it

>> doesn't look like that.

>>

>>

>> Did anyone see such a behaviour

>>

>>

>> In order to audit, I calculated for each event the minute it arrived,

>> and assigned this value to the event, I used statsd do to count all

>> events from all my producer cluster, and all consumer group cluster.

>>

>>

>> I must say that it is not a happening for every minute,

>>

>>

>> Thanks, Guy

>

>

> --

> -- Guozhang

RE: Consume more than produce

Posted by Guy Doulberg <Gu...@perion.com>.
Hey



I had an issue in the production two days ago,



For some reason 2 brokers in my 5 brokers cluster were stuck, meaning their process was up, but they didn't answer to port 9092. The ZK saw them as live brokers.



Producer couldn't produce events to them and consumer couldn't consume



Solving the issue I restarted the brokers, and also I stopped all the consumers I had, and started them again.



Before I try to understand what happened to the brokers, I want to show you what happened to the data in my system.



[cid:image001.png@01CFB585.69F27950]



The orange line is the number of events the consumer consumed

The blue line is the number of events the producer produced.



When there is the differences it's the time the brokers did their troubles.



I guess the problem was the consumer didn't commit the msgs, or something like that, but if that is so, why did the consumer continued consuming, might that help me understand the problem I have in general that even when times are regular the consumer consumes more the producer?



Thanks.



-----Original Message-----
From: Guy Doulberg [mailto:Guy.Doulberg@perion.com]
Sent: Monday, August 04, 2014 2:12 PM
To: users@kafka.apache.org
Subject: RE: Consume more than produce



Hi Daniel



I count once when producing and count once when consuming, the timestamp is calculated once before producing, and it is being attached to the msg so the consumer will use the same TS to count



Thanks



-----Original Message-----

From: Daniel Compton [mailto:desk@danielcompton.net]

Sent: Monday, August 04, 2014 12:35 PM

To: users@kafka.apache.org<ma...@kafka.apache.org>

Subject: Re: Consume more than produce



Hi Guy



In your reconciliation, where was the time stamp coming from? Is it possible that messages were delivered several times but your calculations only counted each unique event?



Daniel.



> On 4/08/2014, at 5:35 pm, Guy Doulberg <Gu...@perion.com>> wrote:

>

> Hi

>

> What do you mean producer ACK value?

>

> In my code I don't have a retry mechanism, the Kafka producer API has a retry mechanism?

>

>

> -----Original Message-----

> From: Guozhang Wang [mailto:wangguoz@gmail.com]

> Sent: Friday, August 01, 2014 6:08 PM

> To: users@kafka.apache.org<ma...@kafka.apache.org>

> Subject: Re: Consume more than produce

>

> What is the ack value used in the producer?

>

>

> On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>>

> wrote:

>

>> Hey,

>>

>>

>> After a year or so I have Kafka as my streaming layer in my

>> production, I decided it is time to audit, and to test how many

>> events do I lose, if I lose events at all.

>>

>>

>> I discovered something interesting which I can't explain.

>>

>>

>> The producer produces less events that the consumer group consumes.

>>

>>

>> It is not much more, it is about 0.1% more events

>>

>>

>> I use the Consumer API (not the simple consumer API)

>>

>>

>> I was thinking I might had rebalancing going on in my system, but it

>> doesn't look like that.

>>

>>

>> Did anyone see such a behaviour

>>

>>

>> In order to audit, I calculated for each event the minute it arrived,

>> and assigned this value to the event, I used statsd do to count all

>> events from all my producer cluster, and all consumer group cluster.

>>

>>

>> I must say that it is not a happening for every minute,

>>

>>

>> Thanks, Guy

>

>

> --

> -- Guozhang

RE: Consume more than produce

Posted by Guy Doulberg <Gu...@perion.com>.
Hi Daniel 

I count once when producing and count once when consuming, the timestamp is calculated once before producing, and it is being attached to the msg so the consumer will use the same TS to count 

Thanks

-----Original Message-----
From: Daniel Compton [mailto:desk@danielcompton.net] 
Sent: Monday, August 04, 2014 12:35 PM
To: users@kafka.apache.org
Subject: Re: Consume more than produce

Hi Guy

In your reconciliation, where was the time stamp coming from? Is it possible that messages were delivered several times but your calculations only counted each unique event?

Daniel.

> On 4/08/2014, at 5:35 pm, Guy Doulberg <Gu...@perion.com> wrote:
> 
> Hi
> 
> What do you mean producer ACK value?
> 
> In my code I don't have a retry mechanism, the Kafka producer API has a retry mechanism?
> 
> 
> -----Original Message-----
> From: Guozhang Wang [mailto:wangguoz@gmail.com]
> Sent: Friday, August 01, 2014 6:08 PM
> To: users@kafka.apache.org
> Subject: Re: Consume more than produce
> 
> What is the ack value used in the producer?
> 
> 
> On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>
> wrote:
> 
>> Hey,
>> 
>> 
>> After a year or so I have Kafka as my streaming layer in my 
>> production, I decided it is time to audit, and to test how many 
>> events do I lose, if I lose events at all.
>> 
>> 
>> I discovered something interesting which I can't explain.
>> 
>> 
>> The producer produces less events that the consumer group consumes.
>> 
>> 
>> It is not much more, it is about 0.1% more events
>> 
>> 
>> I use the Consumer API (not the simple consumer API)
>> 
>> 
>> I was thinking I might had rebalancing going on in my system, but it 
>> doesn't look like that.
>> 
>> 
>> Did anyone see such a behaviour
>> 
>> 
>> In order to audit, I calculated for each event the minute it arrived, 
>> and assigned this value to the event, I used statsd do to count all 
>> events from all my producer cluster, and all consumer group cluster.
>> 
>> 
>> I must say that it is not a happening for every minute,
>> 
>> 
>> Thanks, Guy
> 
> 
> --
> -- Guozhang

Re: Consume more than produce

Posted by Daniel Compton <de...@danielcompton.net>.
Hi Guy

In your reconciliation, where was the time stamp coming from? Is it possible that messages were delivered several times but your calculations only counted each unique event?

Daniel.

> On 4/08/2014, at 5:35 pm, Guy Doulberg <Gu...@perion.com> wrote:
> 
> Hi 
> 
> What do you mean producer ACK value?
> 
> In my code I don't have a retry mechanism, the Kafka producer API has a retry mechanism?
> 
> 
> -----Original Message-----
> From: Guozhang Wang [mailto:wangguoz@gmail.com] 
> Sent: Friday, August 01, 2014 6:08 PM
> To: users@kafka.apache.org
> Subject: Re: Consume more than produce
> 
> What is the ack value used in the producer?
> 
> 
> On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>
> wrote:
> 
>> Hey,
>> 
>> 
>> After a year or so I have Kafka as my streaming layer in my 
>> production, I decided it is time to audit, and to test how many events 
>> do I lose, if I lose events at all.
>> 
>> 
>> I discovered something interesting which I can't explain.
>> 
>> 
>> The producer produces less events that the consumer group consumes.
>> 
>> 
>> It is not much more, it is about 0.1% more events
>> 
>> 
>> I use the Consumer API (not the simple consumer API)
>> 
>> 
>> I was thinking I might had rebalancing going on in my system, but it 
>> doesn't look like that.
>> 
>> 
>> Did anyone see such a behaviour
>> 
>> 
>> In order to audit, I calculated for each event the minute it arrived, 
>> and assigned this value to the event, I used statsd do to count all 
>> events from all my producer cluster, and all consumer group cluster.
>> 
>> 
>> I must say that it is not a happening for every minute,
>> 
>> 
>> Thanks, Guy
> 
> 
> --
> -- Guozhang

RE: Consume more than produce

Posted by Guy Doulberg <Gu...@perion.com>.
Hi 

What do you mean producer ACK value?

In my code I don't have a retry mechanism, the Kafka producer API has a retry mechanism?


-----Original Message-----
From: Guozhang Wang [mailto:wangguoz@gmail.com] 
Sent: Friday, August 01, 2014 6:08 PM
To: users@kafka.apache.org
Subject: Re: Consume more than produce

What is the ack value used in the producer?


On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>
wrote:

> Hey,
>
>
> After a year or so I have Kafka as my streaming layer in my 
> production, I decided it is time to audit, and to test how many events 
> do I lose, if I lose events at all.
>
>
> I discovered something interesting which I can't explain.
>
>
> The producer produces less events that the consumer group consumes.
>
>
> It is not much more, it is about 0.1% more events
>
>
> I use the Consumer API (not the simple consumer API)
>
>
> I was thinking I might had rebalancing going on in my system, but it 
> doesn't look like that.
>
>
> Did anyone see such a behaviour
>
>
> In order to audit, I calculated for each event the minute it arrived, 
> and assigned this value to the event, I used statsd do to count all 
> events from all my producer cluster, and all consumer group cluster.
>
>
> I must say that it is not a happening for every minute,
>
>
> Thanks, Guy
>
>
>


--
-- Guozhang

Re: Consume more than produce

Posted by Guozhang Wang <wa...@gmail.com>.
What is the ack value used in the producer?


On Fri, Aug 1, 2014 at 1:28 AM, Guy Doulberg <Gu...@perion.com>
wrote:

> Hey,
>
>
> After a year or so I have Kafka as my streaming layer in my production, I
> decided it is time to audit, and to test how many events do I lose, if I
> lose events at all.
>
>
> I discovered something interesting which I can't explain.
>
>
> The producer produces less events that the consumer group consumes.
>
>
> It is not much more, it is about 0.1% more events
>
>
> I use the Consumer API (not the simple consumer API)
>
>
> I was thinking I might had rebalancing going on in my system, but it
> doesn't look like that.
>
>
> Did anyone see such a behaviour
>
>
> In order to audit, I calculated for each event the minute it arrived, and
> assigned this value to the event, I used statsd do to count all events from
> all my producer cluster, and all consumer group cluster.
>
>
> I must say that it is not a happening for every minute,
>
>
> Thanks, Guy
>
>
>


-- 
-- Guozhang