You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bhavesh Mistry <mi...@gmail.com> on 2014/06/25 02:20:14 UTC

Monitoring Producers at Large Scale

We use Kafka as Transport Layer to transport application logs.  How do we
monitor Producers at large scales about 6000 boxes x 4 topic per box so
roughly 24000 producers (spread across multiple data center.. we have
brokers per DC).  We do the monitoring based on logs.  I have tried
intercepting logs via Log4J custom implementation which only intercept WARN
and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton append method
which send its logs to brokers (This is working but after load testing it
is causing deadlock some times between ProducerSendThread and Producer).

I know there are JMX monitoring MBeans available which we can pull the
data, but I would like to monitor Exceptions eg Leader Not Found, Queue is
full, resend fail etc in Kafka Library.

How does LinkedIn monitor the Producers ?

Thanks,

Bhavesh

Re: Monitoring Producers at Large Scale

Posted by Bhavesh Mistry <mi...@gmail.com>.
HI Otis,

You are right.  If the Kafka itself have problem (QUEUE is full, auto
rebalance etc, drop event), how can it transmit the logs...  So we have
tried to avoid "agent based" solution Apache Flume Agent or Syslog
configuration.

You are right we have to build a redundant transportation for monitoring
Transport Layer.

Thank you very much for suggestion.  I will look into Logsene
<https://sematext.atlassian.net/wiki/display/PUBLOGSENE/Sending+Events+to+Logsene>
.
    The problem is we have 4 data centers and 24000 or more producers.  so
when application team come to us our data is lost or we do not see our log
lines etc... we have to pin point what exactly happen.  So it is very ideal
to monitor/transmit/set alarm for Kafka Producers.

We replaced the Apache Flume with "Apache Kafka" as log transportation
Layer.  Agent is not required.


Thanks,
Bhavesh



On Mon, Jul 7, 2014 at 1:41 PM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi,
>
> I'm late to the thread... but that "...we intercept log4j..." caught my
> attention.  Why intercept, especially if it's causing trouble?
>
> Could you use log4j syslog appender and get logs routed to wherever you
> want them via syslog, for example?
> Or you can have syslog tail log4j log files (e.g. rsyslog has "imfile" you
> can use for tailing).
>
> We use our own Logsene <http://sematext.com/logsene/> for Kafka and all
> other logs and SPM <http://sematext.com/spm/> for Kafka and all other
> metrics we monitor.
>
> Oh, actually, this may help you:
>
> https://sematext.atlassian.net/wiki/display/PUBLOGSENE/Sending+Events+to+Logsene
> (ignore the Logsene-specific parts --- there is plenty of general info,
> configs, etc. for log handling)
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Jun 26, 2014 at 3:09 PM, Bhavesh Mistry <
> mistry.p.bhavesh@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Thanks for all your responses.
> >
> >
> >
> > JMX metrics are there and we do pull the metrics, but I would like to
> > capture the logs from Kafka lib as well especially WARN, FATAL and ERROR
> > etc to debug the issue.
> >
> >
> >
> > To do this, we intercept Log4j logging and send it to Kafka Log Topics,
> but
> > I realize that under heavy Kafka Lib error/warn/  it will create a
> deadlock
> > between Producer Send thread  (Logging Kafka log topic queue...)
> >
> >
> >
> > *public* *class* KafkaLog4jAppender *extends* AppenderSkeleton {
> >
> >
> >
> > Producer  producer......
> >
> > *protected* *void* append(LoggingEvent event) {
> >
> >
> > if(event.getLoggerName().startsWith("kafka")){
> >
> >      if(event is WARN, FATAL and ERROR){
> >
> >                         producer.send(event.getRenderedMessage())
> >
> >             }
> >
> > }
> >
> >
> >
> > }
> >
> >
> > Other option is to log Kafka Logs into disk and transport logs via
> > separate process
> > to Kafka Topic and transport via https://github.com/harelba/tail2kafka
> to
> > topic...
> >
> >
> > We use Kafka for Log transportation and we want to debug/trouble shoot
> > issue via logs or create alerts/etc
> >
> >
> > Thanks,
> >
> >
> > Bhavesh
> >
> >
> >
> >
> > On Wed, Jun 25, 2014 at 10:49 AM, Neha Narkhede <neha.narkhede@gmail.com
> >
> > wrote:
> >
> > > We monitor producers or for that matter any process/service using JMX
> > > metrics. Every server and service in LinkedIn sends metrics in a Kafka
> > > message to a metrics Kafka cluster. We have subscribers that connect to
> > the
> > > metrics cluster to index that data in RRDs.
> > >
> > > Our aim is to expose all important metrics through JMX. We are doing
> that
> > > for the new producer under org.apache.kafka.clients.producer. Feel free
> > to
> > > take a look at that and give feedback.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Tue, Jun 24, 2014 at 7:59 PM, Darion Yaphet <
> darion.yaphet@gmail.com>
> > > wrote:
> > >
> > > > Sorry I want to  know  you want to monitor kafka producers or kafka
> > > brokers
> > > > and zookeepers ?
> > > > It's seems you will want to monitor monitor Exceptions eg Leader Not
> > > Found,
> > > > Queue is full, resend fail  etc  are kafka cluster
> > > >
> > > >
> > > > 2014-06-25 8:20 GMT+08:00 Bhavesh Mistry <mistry.p.bhavesh@gmail.com
> >:
> > > >
> > > > > We use Kafka as Transport Layer to transport application logs.  How
> > do
> > > we
> > > > > monitor Producers at large scales about 6000 boxes x 4 topic per
> box
> > so
> > > > > roughly 24000 producers (spread across multiple data center.. we
> have
> > > > > brokers per DC).  We do the monitoring based on logs.  I have tried
> > > > > intercepting logs via Log4J custom implementation which only
> > intercept
> > > > WARN
> > > > > and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton
> append
> > > > method
> > > > > which send its logs to brokers (This is working but after load
> > testing
> > > it
> > > > > is causing deadlock some times between ProducerSendThread and
> > > Producer).
> > > > >
> > > > > I know there are JMX monitoring MBeans available which we can pull
> > the
> > > > > data, but I would like to monitor Exceptions eg Leader Not Found,
> > Queue
> > > > is
> > > > > full, resend fail etc in Kafka Library.
> > > > >
> > > > > How does LinkedIn monitor the Producers ?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Bhavesh
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > long is the way and hard  that out of Hell leads up to light
> > > >
> > >
> >
>

Re: Monitoring Producers at Large Scale

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

I'm late to the thread... but that "...we intercept log4j..." caught my
attention.  Why intercept, especially if it's causing trouble?

Could you use log4j syslog appender and get logs routed to wherever you
want them via syslog, for example?
Or you can have syslog tail log4j log files (e.g. rsyslog has "imfile" you
can use for tailing).

We use our own Logsene <http://sematext.com/logsene/> for Kafka and all
other logs and SPM <http://sematext.com/spm/> for Kafka and all other
metrics we monitor.

Oh, actually, this may help you:
https://sematext.atlassian.net/wiki/display/PUBLOGSENE/Sending+Events+to+Logsene
(ignore the Logsene-specific parts --- there is plenty of general info,
configs, etc. for log handling)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jun 26, 2014 at 3:09 PM, Bhavesh Mistry <mi...@gmail.com>
wrote:

> Hi All,
>
> Thanks for all your responses.
>
>
>
> JMX metrics are there and we do pull the metrics, but I would like to
> capture the logs from Kafka lib as well especially WARN, FATAL and ERROR
> etc to debug the issue.
>
>
>
> To do this, we intercept Log4j logging and send it to Kafka Log Topics, but
> I realize that under heavy Kafka Lib error/warn/  it will create a deadlock
> between Producer Send thread  (Logging Kafka log topic queue...)
>
>
>
> *public* *class* KafkaLog4jAppender *extends* AppenderSkeleton {
>
>
>
> Producer  producer......
>
> *protected* *void* append(LoggingEvent event) {
>
>
> if(event.getLoggerName().startsWith("kafka")){
>
>      if(event is WARN, FATAL and ERROR){
>
>                         producer.send(event.getRenderedMessage())
>
>             }
>
> }
>
>
>
> }
>
>
> Other option is to log Kafka Logs into disk and transport logs via
> separate process
> to Kafka Topic and transport via https://github.com/harelba/tail2kafka to
> topic...
>
>
> We use Kafka for Log transportation and we want to debug/trouble shoot
> issue via logs or create alerts/etc
>
>
> Thanks,
>
>
> Bhavesh
>
>
>
>
> On Wed, Jun 25, 2014 at 10:49 AM, Neha Narkhede <ne...@gmail.com>
> wrote:
>
> > We monitor producers or for that matter any process/service using JMX
> > metrics. Every server and service in LinkedIn sends metrics in a Kafka
> > message to a metrics Kafka cluster. We have subscribers that connect to
> the
> > metrics cluster to index that data in RRDs.
> >
> > Our aim is to expose all important metrics through JMX. We are doing that
> > for the new producer under org.apache.kafka.clients.producer. Feel free
> to
> > take a look at that and give feedback.
> >
> > Thanks,
> > Neha
> >
> >
> > On Tue, Jun 24, 2014 at 7:59 PM, Darion Yaphet <da...@gmail.com>
> > wrote:
> >
> > > Sorry I want to  know  you want to monitor kafka producers or kafka
> > brokers
> > > and zookeepers ?
> > > It's seems you will want to monitor monitor Exceptions eg Leader Not
> > Found,
> > > Queue is full, resend fail  etc  are kafka cluster
> > >
> > >
> > > 2014-06-25 8:20 GMT+08:00 Bhavesh Mistry <mi...@gmail.com>:
> > >
> > > > We use Kafka as Transport Layer to transport application logs.  How
> do
> > we
> > > > monitor Producers at large scales about 6000 boxes x 4 topic per box
> so
> > > > roughly 24000 producers (spread across multiple data center.. we have
> > > > brokers per DC).  We do the monitoring based on logs.  I have tried
> > > > intercepting logs via Log4J custom implementation which only
> intercept
> > > WARN
> > > > and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton append
> > > method
> > > > which send its logs to brokers (This is working but after load
> testing
> > it
> > > > is causing deadlock some times between ProducerSendThread and
> > Producer).
> > > >
> > > > I know there are JMX monitoring MBeans available which we can pull
> the
> > > > data, but I would like to monitor Exceptions eg Leader Not Found,
> Queue
> > > is
> > > > full, resend fail etc in Kafka Library.
> > > >
> > > > How does LinkedIn monitor the Producers ?
> > > >
> > > > Thanks,
> > > >
> > > > Bhavesh
> > > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > long is the way and hard  that out of Hell leads up to light
> > >
> >
>

Re: Monitoring Producers at Large Scale

Posted by Bhavesh Mistry <mi...@gmail.com>.
Hi All,

Thanks for all your responses.



JMX metrics are there and we do pull the metrics, but I would like to
capture the logs from Kafka lib as well especially WARN, FATAL and ERROR
etc to debug the issue.



To do this, we intercept Log4j logging and send it to Kafka Log Topics, but
I realize that under heavy Kafka Lib error/warn/  it will create a deadlock
between Producer Send thread  (Logging Kafka log topic queue...)



*public* *class* KafkaLog4jAppender *extends* AppenderSkeleton {



Producer  producer......

*protected* *void* append(LoggingEvent event) {


if(event.getLoggerName().startsWith("kafka")){

     if(event is WARN, FATAL and ERROR){

                        producer.send(event.getRenderedMessage())

            }

}



}


Other option is to log Kafka Logs into disk and transport logs via
separate process
to Kafka Topic and transport via https://github.com/harelba/tail2kafka to
topic...


We use Kafka for Log transportation and we want to debug/trouble shoot
issue via logs or create alerts/etc


Thanks,


Bhavesh




On Wed, Jun 25, 2014 at 10:49 AM, Neha Narkhede <ne...@gmail.com>
wrote:

> We monitor producers or for that matter any process/service using JMX
> metrics. Every server and service in LinkedIn sends metrics in a Kafka
> message to a metrics Kafka cluster. We have subscribers that connect to the
> metrics cluster to index that data in RRDs.
>
> Our aim is to expose all important metrics through JMX. We are doing that
> for the new producer under org.apache.kafka.clients.producer. Feel free to
> take a look at that and give feedback.
>
> Thanks,
> Neha
>
>
> On Tue, Jun 24, 2014 at 7:59 PM, Darion Yaphet <da...@gmail.com>
> wrote:
>
> > Sorry I want to  know  you want to monitor kafka producers or kafka
> brokers
> > and zookeepers ?
> > It's seems you will want to monitor monitor Exceptions eg Leader Not
> Found,
> > Queue is full, resend fail  etc  are kafka cluster
> >
> >
> > 2014-06-25 8:20 GMT+08:00 Bhavesh Mistry <mi...@gmail.com>:
> >
> > > We use Kafka as Transport Layer to transport application logs.  How do
> we
> > > monitor Producers at large scales about 6000 boxes x 4 topic per box so
> > > roughly 24000 producers (spread across multiple data center.. we have
> > > brokers per DC).  We do the monitoring based on logs.  I have tried
> > > intercepting logs via Log4J custom implementation which only intercept
> > WARN
> > > and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton append
> > method
> > > which send its logs to brokers (This is working but after load testing
> it
> > > is causing deadlock some times between ProducerSendThread and
> Producer).
> > >
> > > I know there are JMX monitoring MBeans available which we can pull the
> > > data, but I would like to monitor Exceptions eg Leader Not Found, Queue
> > is
> > > full, resend fail etc in Kafka Library.
> > >
> > > How does LinkedIn monitor the Producers ?
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> >
> >
> >
> > --
> >
> >
> > long is the way and hard  that out of Hell leads up to light
> >
>

Re: Monitoring Producers at Large Scale

Posted by Neha Narkhede <ne...@gmail.com>.
We monitor producers or for that matter any process/service using JMX
metrics. Every server and service in LinkedIn sends metrics in a Kafka
message to a metrics Kafka cluster. We have subscribers that connect to the
metrics cluster to index that data in RRDs.

Our aim is to expose all important metrics through JMX. We are doing that
for the new producer under org.apache.kafka.clients.producer. Feel free to
take a look at that and give feedback.

Thanks,
Neha


On Tue, Jun 24, 2014 at 7:59 PM, Darion Yaphet <da...@gmail.com>
wrote:

> Sorry I want to  know  you want to monitor kafka producers or kafka brokers
> and zookeepers ?
> It's seems you will want to monitor monitor Exceptions eg Leader Not Found,
> Queue is full, resend fail  etc  are kafka cluster
>
>
> 2014-06-25 8:20 GMT+08:00 Bhavesh Mistry <mi...@gmail.com>:
>
> > We use Kafka as Transport Layer to transport application logs.  How do we
> > monitor Producers at large scales about 6000 boxes x 4 topic per box so
> > roughly 24000 producers (spread across multiple data center.. we have
> > brokers per DC).  We do the monitoring based on logs.  I have tried
> > intercepting logs via Log4J custom implementation which only intercept
> WARN
> > and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton append
> method
> > which send its logs to brokers (This is working but after load testing it
> > is causing deadlock some times between ProducerSendThread and Producer).
> >
> > I know there are JMX monitoring MBeans available which we can pull the
> > data, but I would like to monitor Exceptions eg Leader Not Found, Queue
> is
> > full, resend fail etc in Kafka Library.
> >
> > How does LinkedIn monitor the Producers ?
> >
> > Thanks,
> >
> > Bhavesh
> >
>
>
>
> --
>
>
> long is the way and hard  that out of Hell leads up to light
>

Re: Monitoring Producers at Large Scale

Posted by Neha Narkhede <ne...@gmail.com>.
We monitor producers or for that matter any process/service using JMX
metrics. Every server and service in LinkedIn sends metrics in a Kafka
message to a metrics Kafka cluster. We have subscribers that connect to the
metrics cluster to index that data in RRDs.

Our aim is to expose all important metrics through JMX. We are doing that
for the new producer under org.apache.kafka.clients.producer. Feel free to
take a look at that and give feedback.

Thanks,
Neha


On Tue, Jun 24, 2014 at 7:59 PM, Darion Yaphet <da...@gmail.com>
wrote:

> Sorry I want to  know  you want to monitor kafka producers or kafka brokers
> and zookeepers ?
> It's seems you will want to monitor monitor Exceptions eg Leader Not Found,
> Queue is full, resend fail  etc  are kafka cluster
>
>
> 2014-06-25 8:20 GMT+08:00 Bhavesh Mistry <mi...@gmail.com>:
>
> > We use Kafka as Transport Layer to transport application logs.  How do we
> > monitor Producers at large scales about 6000 boxes x 4 topic per box so
> > roughly 24000 producers (spread across multiple data center.. we have
> > brokers per DC).  We do the monitoring based on logs.  I have tried
> > intercepting logs via Log4J custom implementation which only intercept
> WARN
> > and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton append
> method
> > which send its logs to brokers (This is working but after load testing it
> > is causing deadlock some times between ProducerSendThread and Producer).
> >
> > I know there are JMX monitoring MBeans available which we can pull the
> > data, but I would like to monitor Exceptions eg Leader Not Found, Queue
> is
> > full, resend fail etc in Kafka Library.
> >
> > How does LinkedIn monitor the Producers ?
> >
> > Thanks,
> >
> > Bhavesh
> >
>
>
>
> --
>
>
> long is the way and hard  that out of Hell leads up to light
>

Re: Monitoring Producers at Large Scale

Posted by Darion Yaphet <da...@gmail.com>.
Sorry I want to  know  you want to monitor kafka producers or kafka brokers
and zookeepers ?
It's seems you will want to monitor monitor Exceptions eg Leader Not Found,
Queue is full, resend fail  etc  are kafka cluster


2014-06-25 8:20 GMT+08:00 Bhavesh Mistry <mi...@gmail.com>:

> We use Kafka as Transport Layer to transport application logs.  How do we
> monitor Producers at large scales about 6000 boxes x 4 topic per box so
> roughly 24000 producers (spread across multiple data center.. we have
> brokers per DC).  We do the monitoring based on logs.  I have tried
> intercepting logs via Log4J custom implementation which only intercept WARN
> and ERROR and FATAL events  org.apache.log4j.AppenderSkeleton append method
> which send its logs to brokers (This is working but after load testing it
> is causing deadlock some times between ProducerSendThread and Producer).
>
> I know there are JMX monitoring MBeans available which we can pull the
> data, but I would like to monitor Exceptions eg Leader Not Found, Queue is
> full, resend fail etc in Kafka Library.
>
> How does LinkedIn monitor the Producers ?
>
> Thanks,
>
> Bhavesh
>



-- 


long is the way and hard  that out of Hell leads up to light