You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jorge Rodriguez <jo...@bloomreach.com> on 2016/04/11 22:54:03 UTC

Spikes in kafka bytes out (while bytes in remain the same)

We are running a kafka cluster for our real-time pixel processing
pipeline.  The data is produced from our pixel servers into kafka, and then
consumed by a spark streaming application.  Based on this, I would expect
that the bytes in vs bytes out should be roughly equal, as each message
should be consumed once.

Under normal operations, the bytes out is a little less than 2X the bytes
in.  Does anyone know why this is?  We do use a replication factor of 2.

Occasionally, we get a spike in Bytes out.  But bytes in remain the same
(see image below).  This correlates with a significant delay in processing
time in the spark streaming side.

Below is a chart of kafka reported bytes out vs in.  The system level
network metrics show the same information (transferred bytes spike).

Could anyone provide some tips for debugging/getting to the bottom of this
issue?

Thanks,
Jorge

*Kafka reported Bytes in Per topic and for all topics vs Kafka bytes out:*

[image: Inline image 1]

Re: Spikes in kafka bytes out (while bytes in remain the same)

Posted by Tom Crayford <tc...@heroku.com>.

Is there any interest in changing this or exposing non replicated bytes out
somewhere via JMX? It'd be nice to expose a real "what the consumers are
doing from the broker's perspective" metric as well as the current one
which munges together replication and other consumers.

On Wed, Apr 20, 2016 at 8:57 PM, Jorge Rodriguez <jo...@bloomreach.com>
wrote:

> Asaf, thanks for your explanation.  This actually makes complete sense, as
> we have 2 replicas.  So the math works out when taking this into
> consideration.
>
> Thanks!
> Jorge
>
> On Sat, Apr 16, 2016 at 9:32 PM, Asaf Mesika <as...@gmail.com>
> wrote:
>
> > Another thought: Brokers replicate data in. So a record weighing 10 bytes
> > will be written out once for replication and one more time to a consumer
> so
> > it will be 20 bytes out. Makes sense?
> > On Thu, 14 Apr 2016 at 02:46 Jorge Rodriguez <jo...@bloomreach.com>
> wrote:
> >
> > > Thanks for your response Asaf.  I have 4 brokers.  These measurements
> are
> > > from the kafka brokers.
> > >
> > > This measurement on this graph comes from Kafka.  It is a sum across
> all
> > 4
> > > brokers of the
> > > metric: kafka.server.BrokerTopicMetrics.BytesInPerSec.1MinuteRate.
> > >
> > > But I also have a system metric which I feed independently using
> collectd
> > > "interface" plugin.  And the bytes out and in match the ones reported
> by
> > > kafka fairly well.  As well there is a corresponding increase in
> network
> > > packets sent.
> > >
> > > Also, in the SparkStreaming side, I can see that during these spikes,
> the
> > > number of received packets and bytes also spikes.
> > >
> > > So during the spikes, I believe that some of the fetch requests are
> > perhaps
> > > failing and we hit a retry.  I am debugging that currently and I think
> > it's
> > > related to the STW GC which happens on spark streaming occasionally.
> > > Working on some GC tuning should alleviate this.
> > >
> > > However, even if this is the case, this would not explain though why
> > under
> > > normal operations, the number of bytes out is 2x the number of bytes
> in.
> > > Since I only have 1 consumer for each topic, I would expect the numbers
> > to
> > > be fairly close.  Do you
> > >
> > >
> > >
> > >
> > > On Tue, Apr 12, 2016 at 8:31 PM, Asaf Mesika <as...@gmail.com>
> > > wrote:
> > >
> > > > Where exactly do you get the measurement from? Your broker? Do you
> have
> > > > only one? Your producer? Your spark job?
> > > > On Mon, 11 Apr 2016 at 23:54 Jorge Rodriguez <jo...@bloomreach.com>
> > > wrote:
> > > >
> > > > > We are running a kafka cluster for our real-time pixel processing
> > > > > pipeline.  The data is produced from our pixel servers into kafka,
> > and
> > > > then
> > > > > consumed by a spark streaming application.  Based on this, I would
> > > expect
> > > > > that the bytes in vs bytes out should be roughly equal, as each
> > message
> > > > > should be consumed once.
> > > > >
> > > > > Under normal operations, the bytes out is a little less than 2X the
> > > bytes
> > > > > in.  Does anyone know why this is?  We do use a replication factor
> of
> > > 2.
> > > > >
> > > > > Occasionally, we get a spike in Bytes out.  But bytes in remain the
> > > same
> > > > > (see image below).  This correlates with a significant delay in
> > > > processing
> > > > > time in the spark streaming side.
> > > > >
> > > > > Below is a chart of kafka reported bytes out vs in.  The system
> level
> > > > > network metrics show the same information (transferred bytes
> spike).
> > > > >
> > > > > Could anyone provide some tips for debugging/getting to the bottom
> of
> > > > this
> > > > > issue?
> > > > >
> > > > > Thanks,
> > > > > Jorge
> > > > >
> > > > > *Kafka reported Bytes in Per topic and for all topics vs Kafka
> bytes
> > > > out:*
> > > > >
> > > > > [image: Inline image 1]
> > > > >
> > > >
> > >
> >
>

Re: Spikes in kafka bytes out (while bytes in remain the same)

Posted by Jorge Rodriguez <jo...@bloomreach.com>.

Asaf, thanks for your explanation.  This actually makes complete sense, as
we have 2 replicas.  So the math works out when taking this into
consideration.

Thanks!
Jorge

On Sat, Apr 16, 2016 at 9:32 PM, Asaf Mesika <as...@gmail.com> wrote:

> Another thought: Brokers replicate data in. So a record weighing 10 bytes
> will be written out once for replication and one more time to a consumer so
> it will be 20 bytes out. Makes sense?
> On Thu, 14 Apr 2016 at 02:46 Jorge Rodriguez <jo...@bloomreach.com> wrote:
>
> > Thanks for your response Asaf.  I have 4 brokers.  These measurements are
> > from the kafka brokers.
> >
> > This measurement on this graph comes from Kafka.  It is a sum across all
> 4
> > brokers of the
> > metric: kafka.server.BrokerTopicMetrics.BytesInPerSec.1MinuteRate.
> >
> > But I also have a system metric which I feed independently using collectd
> > "interface" plugin.  And the bytes out and in match the ones reported by
> > kafka fairly well.  As well there is a corresponding increase in network
> > packets sent.
> >
> > Also, in the SparkStreaming side, I can see that during these spikes, the
> > number of received packets and bytes also spikes.
> >
> > So during the spikes, I believe that some of the fetch requests are
> perhaps
> > failing and we hit a retry.  I am debugging that currently and I think
> it's
> > related to the STW GC which happens on spark streaming occasionally.
> > Working on some GC tuning should alleviate this.
> >
> > However, even if this is the case, this would not explain though why
> under
> > normal operations, the number of bytes out is 2x the number of bytes in.
> > Since I only have 1 consumer for each topic, I would expect the numbers
> to
> > be fairly close.  Do you
> >
> >
> >
> >
> > On Tue, Apr 12, 2016 at 8:31 PM, Asaf Mesika <as...@gmail.com>
> > wrote:
> >
> > > Where exactly do you get the measurement from? Your broker? Do you have
> > > only one? Your producer? Your spark job?
> > > On Mon, 11 Apr 2016 at 23:54 Jorge Rodriguez <jo...@bloomreach.com>
> > wrote:
> > >
> > > > We are running a kafka cluster for our real-time pixel processing
> > > > pipeline.  The data is produced from our pixel servers into kafka,
> and
> > > then
> > > > consumed by a spark streaming application.  Based on this, I would
> > expect
> > > > that the bytes in vs bytes out should be roughly equal, as each
> message
> > > > should be consumed once.
> > > >
> > > > Under normal operations, the bytes out is a little less than 2X the
> > bytes
> > > > in.  Does anyone know why this is?  We do use a replication factor of
> > 2.
> > > >
> > > > Occasionally, we get a spike in Bytes out.  But bytes in remain the
> > same
> > > > (see image below).  This correlates with a significant delay in
> > > processing
> > > > time in the spark streaming side.
> > > >
> > > > Below is a chart of kafka reported bytes out vs in.  The system level
> > > > network metrics show the same information (transferred bytes spike).
> > > >
> > > > Could anyone provide some tips for debugging/getting to the bottom of
> > > this
> > > > issue?
> > > >
> > > > Thanks,
> > > > Jorge
> > > >
> > > > *Kafka reported Bytes in Per topic and for all topics vs Kafka bytes
> > > out:*
> > > >
> > > > [image: Inline image 1]
> > > >
> > >
> >
>

Re: Spikes in kafka bytes out (while bytes in remain the same)

Posted by Asaf Mesika <as...@gmail.com>.

Another thought: Brokers replicate data in. So a record weighing 10 bytes
will be written out once for replication and one more time to a consumer so
it will be 20 bytes out. Makes sense?
On Thu, 14 Apr 2016 at 02:46 Jorge Rodriguez <jo...@bloomreach.com> wrote:

> Thanks for your response Asaf.  I have 4 brokers.  These measurements are
> from the kafka brokers.
>
> This measurement on this graph comes from Kafka.  It is a sum across all 4
> brokers of the
> metric: kafka.server.BrokerTopicMetrics.BytesInPerSec.1MinuteRate.
>
> But I also have a system metric which I feed independently using collectd
> "interface" plugin.  And the bytes out and in match the ones reported by
> kafka fairly well.  As well there is a corresponding increase in network
> packets sent.
>
> Also, in the SparkStreaming side, I can see that during these spikes, the
> number of received packets and bytes also spikes.
>
> So during the spikes, I believe that some of the fetch requests are perhaps
> failing and we hit a retry.  I am debugging that currently and I think it's
> related to the STW GC which happens on spark streaming occasionally.
> Working on some GC tuning should alleviate this.
>
> However, even if this is the case, this would not explain though why under
> normal operations, the number of bytes out is 2x the number of bytes in.
> Since I only have 1 consumer for each topic, I would expect the numbers to
> be fairly close.  Do you
>
>
>
>
> On Tue, Apr 12, 2016 at 8:31 PM, Asaf Mesika <as...@gmail.com>
> wrote:
>
> > Where exactly do you get the measurement from? Your broker? Do you have
> > only one? Your producer? Your spark job?
> > On Mon, 11 Apr 2016 at 23:54 Jorge Rodriguez <jo...@bloomreach.com>
> wrote:
> >
> > > We are running a kafka cluster for our real-time pixel processing
> > > pipeline.  The data is produced from our pixel servers into kafka, and
> > then
> > > consumed by a spark streaming application.  Based on this, I would
> expect
> > > that the bytes in vs bytes out should be roughly equal, as each message
> > > should be consumed once.
> > >
> > > Under normal operations, the bytes out is a little less than 2X the
> bytes
> > > in.  Does anyone know why this is?  We do use a replication factor of
> 2.
> > >
> > > Occasionally, we get a spike in Bytes out.  But bytes in remain the
> same
> > > (see image below).  This correlates with a significant delay in
> > processing
> > > time in the spark streaming side.
> > >
> > > Below is a chart of kafka reported bytes out vs in.  The system level
> > > network metrics show the same information (transferred bytes spike).
> > >
> > > Could anyone provide some tips for debugging/getting to the bottom of
> > this
> > > issue?
> > >
> > > Thanks,
> > > Jorge
> > >
> > > *Kafka reported Bytes in Per topic and for all topics vs Kafka bytes
> > out:*
> > >
> > > [image: Inline image 1]
> > >
> >
>

Re: Spikes in kafka bytes out (while bytes in remain the same)

Posted by Jorge Rodriguez <jo...@bloomreach.com>.

Thanks for your response Asaf.  I have 4 brokers.  These measurements are
from the kafka brokers.

This measurement on this graph comes from Kafka.  It is a sum across all 4
brokers of the
metric: kafka.server.BrokerTopicMetrics.BytesInPerSec.1MinuteRate.

But I also have a system metric which I feed independently using collectd
"interface" plugin.  And the bytes out and in match the ones reported by
kafka fairly well.  As well there is a corresponding increase in network
packets sent.

Also, in the SparkStreaming side, I can see that during these spikes, the
number of received packets and bytes also spikes.

So during the spikes, I believe that some of the fetch requests are perhaps
failing and we hit a retry.  I am debugging that currently and I think it's
related to the STW GC which happens on spark streaming occasionally.
Working on some GC tuning should alleviate this.

However, even if this is the case, this would not explain though why under
normal operations, the number of bytes out is 2x the number of bytes in.
Since I only have 1 consumer for each topic, I would expect the numbers to
be fairly close.  Do you

On Tue, Apr 12, 2016 at 8:31 PM, Asaf Mesika <as...@gmail.com> wrote:

> Where exactly do you get the measurement from? Your broker? Do you have
> only one? Your producer? Your spark job?
> On Mon, 11 Apr 2016 at 23:54 Jorge Rodriguez <jo...@bloomreach.com> wrote:
>
> > We are running a kafka cluster for our real-time pixel processing
> > pipeline.  The data is produced from our pixel servers into kafka, and
> then
> > consumed by a spark streaming application.  Based on this, I would expect
> > that the bytes in vs bytes out should be roughly equal, as each message
> > should be consumed once.
> >
> > Under normal operations, the bytes out is a little less than 2X the bytes
> > in.  Does anyone know why this is?  We do use a replication factor of 2.
> >
> > Occasionally, we get a spike in Bytes out.  But bytes in remain the same
> > (see image below).  This correlates with a significant delay in
> processing
> > time in the spark streaming side.
> >
> > Below is a chart of kafka reported bytes out vs in.  The system level
> > network metrics show the same information (transferred bytes spike).
> >
> > Could anyone provide some tips for debugging/getting to the bottom of
> this
> > issue?
> >
> > Thanks,
> > Jorge
> >
> > *Kafka reported Bytes in Per topic and for all topics vs Kafka bytes
> out:*
> >
> > [image: Inline image 1]
> >
>

Re: Spikes in kafka bytes out (while bytes in remain the same)

Posted by Asaf Mesika <as...@gmail.com>.

Where exactly do you get the measurement from? Your broker? Do you have
only one? Your producer? Your spark job?
On Mon, 11 Apr 2016 at 23:54 Jorge Rodriguez <jo...@bloomreach.com> wrote:

> We are running a kafka cluster for our real-time pixel processing
> pipeline.  The data is produced from our pixel servers into kafka, and then
> consumed by a spark streaming application.  Based on this, I would expect
> that the bytes in vs bytes out should be roughly equal, as each message
> should be consumed once.
>
> Under normal operations, the bytes out is a little less than 2X the bytes
> in.  Does anyone know why this is?  We do use a replication factor of 2.
>
> Occasionally, we get a spike in Bytes out.  But bytes in remain the same
> (see image below).  This correlates with a significant delay in processing
> time in the spark streaming side.
>
> Below is a chart of kafka reported bytes out vs in.  The system level
> network metrics show the same information (transferred bytes spike).
>
> Could anyone provide some tips for debugging/getting to the bottom of this
> issue?
>
> Thanks,
> Jorge
>
> *Kafka reported Bytes in Per topic and for all topics vs Kafka bytes out:*
>
> [image: Inline image 1]
>