You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Marina <pp...@yahoo.com.INVALID> on 2015/06/02 18:50:11 UTC

Kafka JMS metrics meaning

Hi, 
I have enabled JMX_PORT for KAfka server and am trying to understand some of the metrics that are being exposed. I have two questions:
1. what are the best metrics to monitor to quickly spot unhealthy Kafka cluster?
2. what do these metrics mean: ReplicaManager -> LeaderCount ? and ReplicaManager -> PartitionCount ?I have three topics created, with one partition each, and replication = 1, however the values for both of the above attributes is "53".... So I am not sure what the count '53' means here....
thanksMarina

Re: Kafka JMS metrics meaning

Posted by Marina <pp...@yahoo.com.INVALID>.
Thanks a lot to everybody for your suggestions! 
In addition to the Consumer lag (on the Consumers side though), under-replicated partitions, offline partitions, active controller count, I am also thinking of monitoring the total size of partitions to not exceed some MAX (like 10G, for example) - to prevent disk out of space issues.

Now, can somebody shed light on my second question :) :"2. what do these metrics mean: ReplicaManager -> LeaderCount ? and ReplicaManager -> PartitionCount ?I have three topics created, with one partition each, and replication = 1, however the values for both of the above attributes is "53".... So I am not sure what the count '53' means here...."
thanks!MArina

      From: Todd Palino <tp...@gmail.com>
 To: "users@kafka.apache.org" <us...@kafka.apache.org> 
Cc: Marina <pp...@yahoo.com> 
 Sent: Tuesday, June 2, 2015 1:29 PM
 Subject: Re: Kafka JMS metrics meaning
   
Under replicated is a must. Offline partitions is also good to monitor. We also use the active controller metric (it's 1 or 0) in aggregate for a cluster to know that the controller is running somewhere. 

For more general metrics, all topics bytes in and bytes out is good. We also watch the leader partitions count to know when to do a preferred replica election. Specifically, we take the ratio of that number to the total partition count for the broker and keep it near 50%

Most other things, like specific request type time and 99% metrics, we generally only look at when we are doing performance testing or have a specific concern. 

-Todd



> On Jun 2, 2015, at 1:01 PM, Aditya Auradkar <aa...@linkedin.com.INVALID> wrote:
> 
> Number of underreplicated partitions, total request time are some good bets.
> 
> Aditya
> 
> ________________________________________
> From: Otis Gospodnetic [otis.gospodnetic@gmail.com]
> Sent: Tuesday, June 02, 2015 9:56 AM
> To: users@kafka.apache.org; Marina
> Subject: Re: Kafka JMS metrics meaning
> 
> Hi,
> 
>> On Tue, Jun 2, 2015 at 12:50 PM, Marina <pp...@yahoo.com.invalid> wrote:
>> 
>> Hi,
>> I have enabled JMX_PORT for KAfka server and am trying to understand some
>> of the metrics that are being exposed. I have two questions:
>> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
>> cluster?
> 
> People loooove looking at consumer lag :)
> 
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
>> ReplicaManager -> PartitionCount ?I have three topics created, with one
>> partition each, and replication = 1, however the values for both of the
>> above attributes is "53".... So I am not sure what the count '53' means
>> here....
>> thanksMarina
>> 

  

Re: Kafka JMS metrics meaning

Posted by Todd Palino <tp...@gmail.com>.
Under replicated is a must. Offline partitions is also good to monitor. We also use the active controller metric (it's 1 or 0) in aggregate for a cluster to know that the controller is running somewhere. 

For more general metrics, all topics bytes in and bytes out is good. We also watch the leader partitions count to know when to do a preferred replica election. Specifically, we take the ratio of that number to the total partition count for the broker and keep it near 50%

Most other things, like specific request type time and 99% metrics, we generally only look at when we are doing performance testing or have a specific concern. 

-Todd

> On Jun 2, 2015, at 1:01 PM, Aditya Auradkar <aa...@linkedin.com.INVALID> wrote:
> 
> Number of underreplicated partitions, total request time are some good bets.
> 
> Aditya
> 
> ________________________________________
> From: Otis Gospodnetic [otis.gospodnetic@gmail.com]
> Sent: Tuesday, June 02, 2015 9:56 AM
> To: users@kafka.apache.org; Marina
> Subject: Re: Kafka JMS metrics meaning
> 
> Hi,
> 
>> On Tue, Jun 2, 2015 at 12:50 PM, Marina <pp...@yahoo.com.invalid> wrote:
>> 
>> Hi,
>> I have enabled JMX_PORT for KAfka server and am trying to understand some
>> of the metrics that are being exposed. I have two questions:
>> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
>> cluster?
> 
> People loooove looking at consumer lag :)
> 
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
>> ReplicaManager -> PartitionCount ?I have three topics created, with one
>> partition each, and replication = 1, however the values for both of the
>> above attributes is "53".... So I am not sure what the count '53' means
>> here....
>> thanksMarina
>> 

RE: Kafka JMS metrics meaning

Posted by Aditya Auradkar <aa...@linkedin.com.INVALID>.
Number of underreplicated partitions, total request time are some good bets.

Aditya

________________________________________
From: Otis Gospodnetic [otis.gospodnetic@gmail.com]
Sent: Tuesday, June 02, 2015 9:56 AM
To: users@kafka.apache.org; Marina
Subject: Re: Kafka JMS metrics meaning

Hi,

On Tue, Jun 2, 2015 at 12:50 PM, Marina <pp...@yahoo.com.invalid> wrote:

> Hi,
> I have enabled JMX_PORT for KAfka server and am trying to understand some
> of the metrics that are being exposed. I have two questions:
> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
> cluster?
>

People loooove looking at consumer lag :)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
> ReplicaManager -> PartitionCount ?I have three topics created, with one
> partition each, and replication = 1, however the values for both of the
> above attributes is "53".... So I am not sure what the count '53' means
> here....
> thanksMarina
>

Re: Kafka JMS metrics meaning

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

On Tue, Jun 2, 2015 at 12:50 PM, Marina <pp...@yahoo.com.invalid> wrote:

> Hi,
> I have enabled JMX_PORT for KAfka server and am trying to understand some
> of the metrics that are being exposed. I have two questions:
> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
> cluster?
>

People loooove looking at consumer lag :)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
> ReplicaManager -> PartitionCount ?I have three topics created, with one
> partition each, and replication = 1, however the values for both of the
> above attributes is "53".... So I am not sure what the count '53' means
> here....
> thanksMarina
>