You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jonathan Creasy <jc...@box.com> on 2012/07/28 03:00:57 UTC

monitoring kafka

How do you guys monitor Kafka? Do any of you have Nagios checks that you
use? What metrics do you find important?

Re: monitoring kafka

Posted by Jonathan Creasy <jc...@box.com>.
nevermind.

On Mon, Jul 30, 2012 at 4:25 PM, Jonathan Creasy <jc...@box.com> wrote:

> Checking out the audit cdoe, the patch in KAFKA-260 doesn't apply for me,
> there is a problem
> in core/src/main/scala/kafka/consumer/ConsumerIterator.scala.
>
> I am working with the branch 0.7.1.
>
> The section now looks like:
>
>     val item = localCurrent.next()
>     consumedOffset = item.offset
>     new MessageAndMetadata(decoder.toEvent(item.message),
> currentTopicInfo.topic)
>
> Should I change decoder.toEvent(item.message) to
> decoder.fromMessage(item.message)?
>
>
> ***************
> *** 80,86 ****
>       }
>       val item = localCurrent.next()
>       consumedOffset = item.offset
> -     decoder.toEvent(item.message)
>     }
>
>     def clearCurrentChunk() = {
> --- 80,86 ----
>       }
>       val item = localCurrent.next()
>       consumedOffset = item.offset
> +     decoder.fromMessage(item.message)
>     }
>
>     def clearCurrentChunk() = {
>
> On Sat, Jul 28, 2012 at 2:07 AM, Pierre-Yves Ritschard <py...@spootnik.org>wrote:
>
>> I use the standard checks check that the process is running. A check
>> in zookeeper that checks for correct partition ownage and number of
>> registered brokers / consumers /producers.
>> Collectd runs on all my machines and pushes out jmx metrics out to
>> graphite. I then use check-graphite which allows checking for consumer
>> lag.
>>
>> On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <ja...@gmail.com> wrote:
>> > LinkedIn has a custom monitoring system partially described here:
>> >
>> http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection
>> >
>> > The integration from the kafka side is basically just jmx, though we
>> have a
>> > few wrappers that expose additional things. We measure basic stuff like
>> > disk stats, messages/sec, latency, etc.
>> >
>> > In addition we due a very kafka specific kind of monitoring we call
>> > "audit". This counts the number of messages sent by every producer,
>> > received by every broker, and received by every consumer and reconciles
>> and
>> > graphs and alerts on these counts. This is very helpful in determining
>> that
>> > all the sent data arrived at its destination. There is a bug open to
>> open
>> > source this piece, though it has a few dependencies.
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-260
>> >
>> > -Jay
>> >
>> > On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <jc...@box.com>
>> wrote:
>> >
>> >> How do you guys monitor Kafka? Do any of you have Nagios checks that
>> you
>> >> use? What metrics do you find important?
>> >>
>>
>
>

Re: monitoring kafka

Posted by Jonathan Creasy <jc...@box.com>.
Checking out the audit cdoe, the patch in KAFKA-260 doesn't apply for me,
there is a problem
in core/src/main/scala/kafka/consumer/ConsumerIterator.scala.

I am working with the branch 0.7.1.

The section now looks like:

    val item = localCurrent.next()
    consumedOffset = item.offset
    new MessageAndMetadata(decoder.toEvent(item.message),
currentTopicInfo.topic)

Should I change decoder.toEvent(item.message) to
decoder.fromMessage(item.message)?


***************
*** 80,86 ****
      }
      val item = localCurrent.next()
      consumedOffset = item.offset
-     decoder.toEvent(item.message)
    }

    def clearCurrentChunk() = {
--- 80,86 ----
      }
      val item = localCurrent.next()
      consumedOffset = item.offset
+     decoder.fromMessage(item.message)
    }

    def clearCurrentChunk() = {

On Sat, Jul 28, 2012 at 2:07 AM, Pierre-Yves Ritschard <py...@spootnik.org>wrote:

> I use the standard checks check that the process is running. A check
> in zookeeper that checks for correct partition ownage and number of
> registered brokers / consumers /producers.
> Collectd runs on all my machines and pushes out jmx metrics out to
> graphite. I then use check-graphite which allows checking for consumer
> lag.
>
> On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <ja...@gmail.com> wrote:
> > LinkedIn has a custom monitoring system partially described here:
> >
> http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection
> >
> > The integration from the kafka side is basically just jmx, though we
> have a
> > few wrappers that expose additional things. We measure basic stuff like
> > disk stats, messages/sec, latency, etc.
> >
> > In addition we due a very kafka specific kind of monitoring we call
> > "audit". This counts the number of messages sent by every producer,
> > received by every broker, and received by every consumer and reconciles
> and
> > graphs and alerts on these counts. This is very helpful in determining
> that
> > all the sent data arrived at its destination. There is a bug open to open
> > source this piece, though it has a few dependencies.
> >
> > https://issues.apache.org/jira/browse/KAFKA-260
> >
> > -Jay
> >
> > On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <jc...@box.com>
> wrote:
> >
> >> How do you guys monitor Kafka? Do any of you have Nagios checks that you
> >> use? What metrics do you find important?
> >>
>

Re: monitoring kafka

Posted by Pierre-Yves Ritschard <py...@spootnik.org>.
I use the standard checks check that the process is running. A check
in zookeeper that checks for correct partition ownage and number of
registered brokers / consumers /producers.
Collectd runs on all my machines and pushes out jmx metrics out to
graphite. I then use check-graphite which allows checking for consumer
lag.

On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <ja...@gmail.com> wrote:
> LinkedIn has a custom monitoring system partially described here:
> http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection
>
> The integration from the kafka side is basically just jmx, though we have a
> few wrappers that expose additional things. We measure basic stuff like
> disk stats, messages/sec, latency, etc.
>
> In addition we due a very kafka specific kind of monitoring we call
> "audit". This counts the number of messages sent by every producer,
> received by every broker, and received by every consumer and reconciles and
> graphs and alerts on these counts. This is very helpful in determining that
> all the sent data arrived at its destination. There is a bug open to open
> source this piece, though it has a few dependencies.
>
> https://issues.apache.org/jira/browse/KAFKA-260
>
> -Jay
>
> On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <jc...@box.com> wrote:
>
>> How do you guys monitor Kafka? Do any of you have Nagios checks that you
>> use? What metrics do you find important?
>>

Re: monitoring kafka

Posted by Jay Kreps <ja...@gmail.com>.
LinkedIn has a custom monitoring system partially described here:
http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection

The integration from the kafka side is basically just jmx, though we have a
few wrappers that expose additional things. We measure basic stuff like
disk stats, messages/sec, latency, etc.

In addition we due a very kafka specific kind of monitoring we call
"audit". This counts the number of messages sent by every producer,
received by every broker, and received by every consumer and reconciles and
graphs and alerts on these counts. This is very helpful in determining that
all the sent data arrived at its destination. There is a bug open to open
source this piece, though it has a few dependencies.

https://issues.apache.org/jira/browse/KAFKA-260

-Jay

On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <jc...@box.com> wrote:

> How do you guys monitor Kafka? Do any of you have Nagios checks that you
> use? What metrics do you find important?
>