You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sachin Mittal <sj...@gmail.com> on 2017/01/25 06:38:01 UTC

How to log/analyze the consumer lag in kafka streaming application

Hi All,
I am running a kafka streaming application with a simple pipeline of:
source topic -> group -> aggregate by key -> for each > save to a sink.

I source topic gets message at rate of 5000 - 10000 messages per second.
During peak load we see the delay reaching to 3 million messages.

So I need to figure out where delay might be happening.

1.  Is there any mechanism in kafka streams to log time spent within each
pipeline stage.

2.  Also if I want to turn on custom logging to log some times how can I do
the same.

I have a log4j.properties and I am packaging it inside a jar which has the
main class.
I place that jar in libs folder of kafka installation.

However I see no logs generated under logs folder.

So where are we suppose to add the log4j.properties.

Thanks
Sachin

Re: How to log/analyze the consumer lag in kafka streaming application

Posted by Joris Meijer <jo...@axual.io>.
Hi Sachin,

If you check kafka-run-class.bat you can see that when environment variable
KAFKA_LOG4J_OPTS is not provided, a default log4j configuration under
"tools" will be loaded. So setting the environment variable to something
like "-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties"
should to the trick for you.

Joris

On Sat, Feb 4, 2017 at 11:38 AM Sachin Mittal <sj...@gmail.com> wrote:

Hi,
As suggested this is how I am starting my stream

D:\kafka_2.10-0.10.1.1>bin\windows\kafka-run-class.bat -Dlog4j.debug
-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties
TestKafkaWindowStream
log4j: Using URL
[file:D:/kafka_2.10-0.10.1.1/config/tools-log4j.properties] for automatic
log4j configuration.

You can see it somehow does not pick my log4j properties but always picks
some tools-log4j.properties.

I have tried many different ways to specify log4j.configurationFile but it
always seem to be picking up a different log4j properties file.

So is it something that we should always configure our log4j under
tools-log4j.properties?

Thanks
Sachin



On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy <da...@gmail.com> wrote:

> -Dlog4j.configuration=your-log4j.properties
>

Re: How to log/analyze the consumer lag in kafka streaming application

Posted by Sachin Mittal <sj...@gmail.com>.
Hi,
As suggested this is how I am starting my stream

D:\kafka_2.10-0.10.1.1>bin\windows\kafka-run-class.bat -Dlog4j.debug
-Dlog4j.configurationFile=file:///D:/kafka_2.10-0.10.1.1/log4js.properties
TestKafkaWindowStream
log4j: Using URL
[file:D:/kafka_2.10-0.10.1.1/config/tools-log4j.properties] for automatic
log4j configuration.

You can see it somehow does not pick my log4j properties but always picks
some tools-log4j.properties.

I have tried many different ways to specify log4j.configurationFile but it
always seem to be picking up a different log4j properties file.

So is it something that we should always configure our log4j under
tools-log4j.properties?

Thanks
Sachin



On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy <da...@gmail.com> wrote:

> -Dlog4j.configuration=your-log4j.properties
>

Re: How to log/analyze the consumer lag in kafka streaming application

Posted by Damian Guy <da...@gmail.com>.
If you are using jmxterm then you are going to connect to a running jvm and
you don't need to set StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG. You
need to connect jmxterm to the MBean server that will be running in the jvm
of your streams app. You'll need to provide an appropriate jmx port for it
to connect to. So, when running your streams application you should start
it with: -Dcom.sun.management.jmxremote.port=your-port

Thanks

On Fri, 27 Jan 2017 at 10:24 Sachin Mittal <sj...@gmail.com> wrote:

> Hi,
> I understood what I need to do.
> I think is not clear though regarding
> StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG
>
> Say I decide to use jmxterm which is cli based client which I can easily
> use where my streams app is running.
> With respect to that what value should I assign it to the
> METRICS_REPORTER_CLASSES_CONFIG
> property?
>
> I am following this guide
> https://cwiki.apache.org/confluence/display/KAFKA/jmxterm+quickstart
>
> However it is not clear what config I need to provide.
>
> Thanks
> Sachin
>
>
>
> On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy <da...@gmail.com> wrote:
>
> > Hi Sachin,
> >
> > You can configure an implementation of org.apache.kafka.common.Metrics.
> > This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG
> >
> > There is a list of jmx reporters here:
> > https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
> > I'm sure their are plenty more available on github. It is also fairly
> > simple to write your own.
> >
> > As for your log4j.properties. You should be able to run with:
> > -Dlog4j.configuration=your-log4j.properties
> >
> > Thanks,
> > Damian
> >
> > On Fri, 27 Jan 2017 at 07:59 Sachin Mittal <sj...@gmail.com> wrote:
> >
> > > Hi,
> > > Thanks for sharing the info.
> > >
> > > I am reading this document for more understanding:
> > > http://kafka.apache.org/documentation.html#monitoring
> > >
> > > Is there any special way I need to start my kafka cluster or streams
> > > application (or configure them) to report these metrics.
> > >
> > > I suppose both cluster and streams application report separate
> metrics. I
> > > mean that to collect streams metrics I need to connect to the jmx port
> on
> > > machine where my streams is running right?
> > >
> > > One issue I see is that the machines where both cluster and streams
> > > application are running are not accessible from outside where I can run
> > any
> > > UI based application like jconsole to report on these metrics.
> > >
> > > So what are other possible option. can I log the metrics values to a
> log
> > > file. or if can I enable logging in general. If yes where do I place my
> > > log4j.properties. I tried making it part of the jar which has my main
> > class
> > > but I don't see any logs getting generated.
> > >
> > > Thanks
> > > Sachin
> > >
> > >
> > >
> > > On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax <
> matthias@confluent.io>
> > > wrote:
> > >
> > > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they
> > are
> > > > even more detailed).
> > > >
> > > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> > > > the same way as for consumer/producer metric that are documented.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > > > > Hi All,
> > > > > I am running a kafka streaming application with a simple pipeline
> of:
> > > > > source topic -> group -> aggregate by key -> for each > save to a
> > sink.
> > > > >
> > > > > I source topic gets message at rate of 5000 - 10000 messages per
> > > second.
> > > > > During peak load we see the delay reaching to 3 million messages.
> > > > >
> > > > > So I need to figure out where delay might be happening.
> > > > >
> > > > > 1.  Is there any mechanism in kafka streams to log time spent
> within
> > > each
> > > > > pipeline stage.
> > > > >
> > > > > 2.  Also if I want to turn on custom logging to log some times how
> > can
> > > I
> > > > do
> > > > > the same.
> > > > >
> > > > > I have a log4j.properties and I am packaging it inside a jar which
> > has
> > > > the
> > > > > main class.
> > > > > I place that jar in libs folder of kafka installation.
> > > > >
> > > > > However I see no logs generated under logs folder.
> > > > >
> > > > > So where are we suppose to add the log4j.properties.
> > > > >
> > > > > Thanks
> > > > > Sachin
> > > > >
> > > >
> > > >
> > >
> >
>

Re: How to log/analyze the consumer lag in kafka streaming application

Posted by Sachin Mittal <sj...@gmail.com>.
Hi,
I understood what I need to do.
I think is not clear though regarding
StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG

Say I decide to use jmxterm which is cli based client which I can easily
use where my streams app is running.
With respect to that what value should I assign it to the
METRICS_REPORTER_CLASSES_CONFIG
property?

I am following this guide
https://cwiki.apache.org/confluence/display/KAFKA/jmxterm+quickstart

However it is not clear what config I need to provide.

Thanks
Sachin



On Fri, Jan 27, 2017 at 2:31 PM, Damian Guy <da...@gmail.com> wrote:

> Hi Sachin,
>
> You can configure an implementation of org.apache.kafka.common.Metrics.
> This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG
>
> There is a list of jmx reporters here:
> https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
> I'm sure their are plenty more available on github. It is also fairly
> simple to write your own.
>
> As for your log4j.properties. You should be able to run with:
> -Dlog4j.configuration=your-log4j.properties
>
> Thanks,
> Damian
>
> On Fri, 27 Jan 2017 at 07:59 Sachin Mittal <sj...@gmail.com> wrote:
>
> > Hi,
> > Thanks for sharing the info.
> >
> > I am reading this document for more understanding:
> > http://kafka.apache.org/documentation.html#monitoring
> >
> > Is there any special way I need to start my kafka cluster or streams
> > application (or configure them) to report these metrics.
> >
> > I suppose both cluster and streams application report separate metrics. I
> > mean that to collect streams metrics I need to connect to the jmx port on
> > machine where my streams is running right?
> >
> > One issue I see is that the machines where both cluster and streams
> > application are running are not accessible from outside where I can run
> any
> > UI based application like jconsole to report on these metrics.
> >
> > So what are other possible option. can I log the metrics values to a log
> > file. or if can I enable logging in general. If yes where do I place my
> > log4j.properties. I tried making it part of the jar which has my main
> class
> > but I don't see any logs getting generated.
> >
> > Thanks
> > Sachin
> >
> >
> >
> > On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> > > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they
> are
> > > even more detailed).
> > >
> > > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> > > the same way as for consumer/producer metric that are documented.
> > >
> > >
> > > -Matthias
> > >
> > > On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > > > Hi All,
> > > > I am running a kafka streaming application with a simple pipeline of:
> > > > source topic -> group -> aggregate by key -> for each > save to a
> sink.
> > > >
> > > > I source topic gets message at rate of 5000 - 10000 messages per
> > second.
> > > > During peak load we see the delay reaching to 3 million messages.
> > > >
> > > > So I need to figure out where delay might be happening.
> > > >
> > > > 1.  Is there any mechanism in kafka streams to log time spent within
> > each
> > > > pipeline stage.
> > > >
> > > > 2.  Also if I want to turn on custom logging to log some times how
> can
> > I
> > > do
> > > > the same.
> > > >
> > > > I have a log4j.properties and I am packaging it inside a jar which
> has
> > > the
> > > > main class.
> > > > I place that jar in libs folder of kafka installation.
> > > >
> > > > However I see no logs generated under logs folder.
> > > >
> > > > So where are we suppose to add the log4j.properties.
> > > >
> > > > Thanks
> > > > Sachin
> > > >
> > >
> > >
> >
>

Re: How to log/analyze the consumer lag in kafka streaming application

Posted by Damian Guy <da...@gmail.com>.
Hi Sachin,

You can configure an implementation of org.apache.kafka.common.Metrics.
This is done via StreamsConfig.METRICS_REPORTER_CLASSES_CONFIG

There is a list of jmx reporters here:
https://cwiki.apache.org/confluence/display/KAFKA/JMX+Reporters
I'm sure their are plenty more available on github. It is also fairly
simple to write your own.

As for your log4j.properties. You should be able to run with:
-Dlog4j.configuration=your-log4j.properties

Thanks,
Damian

On Fri, 27 Jan 2017 at 07:59 Sachin Mittal <sj...@gmail.com> wrote:

> Hi,
> Thanks for sharing the info.
>
> I am reading this document for more understanding:
> http://kafka.apache.org/documentation.html#monitoring
>
> Is there any special way I need to start my kafka cluster or streams
> application (or configure them) to report these metrics.
>
> I suppose both cluster and streams application report separate metrics. I
> mean that to collect streams metrics I need to connect to the jmx port on
> machine where my streams is running right?
>
> One issue I see is that the machines where both cluster and streams
> application are running are not accessible from outside where I can run any
> UI based application like jconsole to report on these metrics.
>
> So what are other possible option. can I log the metrics values to a log
> file. or if can I enable logging in general. If yes where do I place my
> log4j.properties. I tried making it part of the jar which has my main class
> but I don't see any logs getting generated.
>
> Thanks
> Sachin
>
>
>
> On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
> > even more detailed).
> >
> > There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> > the same way as for consumer/producer metric that are documented.
> >
> >
> > -Matthias
> >
> > On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > > Hi All,
> > > I am running a kafka streaming application with a simple pipeline of:
> > > source topic -> group -> aggregate by key -> for each > save to a sink.
> > >
> > > I source topic gets message at rate of 5000 - 10000 messages per
> second.
> > > During peak load we see the delay reaching to 3 million messages.
> > >
> > > So I need to figure out where delay might be happening.
> > >
> > > 1.  Is there any mechanism in kafka streams to log time spent within
> each
> > > pipeline stage.
> > >
> > > 2.  Also if I want to turn on custom logging to log some times how can
> I
> > do
> > > the same.
> > >
> > > I have a log4j.properties and I am packaging it inside a jar which has
> > the
> > > main class.
> > > I place that jar in libs folder of kafka installation.
> > >
> > > However I see no logs generated under logs folder.
> > >
> > > So where are we suppose to add the log4j.properties.
> > >
> > > Thanks
> > > Sachin
> > >
> >
> >
>

Re: How to log/analyze the consumer lag in kafka streaming application

Posted by Sachin Mittal <sj...@gmail.com>.
Hi,
Thanks for sharing the info.

I am reading this document for more understanding:
http://kafka.apache.org/documentation.html#monitoring

Is there any special way I need to start my kafka cluster or streams
application (or configure them) to report these metrics.

I suppose both cluster and streams application report separate metrics. I
mean that to collect streams metrics I need to connect to the jmx port on
machine where my streams is running right?

One issue I see is that the machines where both cluster and streams
application are running are not accessible from outside where I can run any
UI based application like jconsole to report on these metrics.

So what are other possible option. can I log the metrics values to a log
file. or if can I enable logging in general. If yes where do I place my
log4j.properties. I tried making it part of the jar which has my main class
but I don't see any logs getting generated.

Thanks
Sachin



On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax <ma...@confluent.io>
wrote:

> You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
> even more detailed).
>
> There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> the same way as for consumer/producer metric that are documented.
>
>
> -Matthias
>
> On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > Hi All,
> > I am running a kafka streaming application with a simple pipeline of:
> > source topic -> group -> aggregate by key -> for each > save to a sink.
> >
> > I source topic gets message at rate of 5000 - 10000 messages per second.
> > During peak load we see the delay reaching to 3 million messages.
> >
> > So I need to figure out where delay might be happening.
> >
> > 1.  Is there any mechanism in kafka streams to log time spent within each
> > pipeline stage.
> >
> > 2.  Also if I want to turn on custom logging to log some times how can I
> do
> > the same.
> >
> > I have a log4j.properties and I am packaging it inside a jar which has
> the
> > main class.
> > I place that jar in libs folder of kafka installation.
> >
> > However I see no logs generated under logs folder.
> >
> > So where are we suppose to add the log4j.properties.
> >
> > Thanks
> > Sachin
> >
>
>

Re: How to log/analyze the consumer lag in kafka streaming application

Posted by "Matthias J. Sax" <ma...@confluent.io>.
You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
even more detailed).

There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
the same way as for consumer/producer metric that are documented.


-Matthias

On 1/24/17 10:38 PM, Sachin Mittal wrote:
> Hi All,
> I am running a kafka streaming application with a simple pipeline of:
> source topic -> group -> aggregate by key -> for each > save to a sink.
> 
> I source topic gets message at rate of 5000 - 10000 messages per second.
> During peak load we see the delay reaching to 3 million messages.
> 
> So I need to figure out where delay might be happening.
> 
> 1.  Is there any mechanism in kafka streams to log time spent within each
> pipeline stage.
> 
> 2.  Also if I want to turn on custom logging to log some times how can I do
> the same.
> 
> I have a log4j.properties and I am packaging it inside a jar which has the
> main class.
> I place that jar in libs folder of kafka installation.
> 
> However I see no logs generated under logs folder.
> 
> So where are we suppose to add the log4j.properties.
> 
> Thanks
> Sachin
>