You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by tao xiao <xi...@gmail.com> on 2016/02/26 12:38:22 UTC

Kafka node liveness check

Hi team,

What is the best way to verify a specific Kafka node functions properly?
Telnet the port is one of the approach but I don't think it tells me
whether or not the broker can still receive/send traffics. I am thinking to
ask for metadata from the broker using consumer.partitionsFor. If it can
return partitioninfo it is considered live. Is this a good approach?

Re: Kafka node liveness check

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Tao,

For your case maybe you can monitor the following jmx as well (see
http://kafka.apache.org/documentation.html#monitoring):

kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec

When a broker cannot properly respond to requests it will be much smaller
compared with other brokers.

Guozhang



On Tue, Mar 1, 2016 at 7:39 PM, tao xiao <xi...@gmail.com> wrote:

> Thanks Elias for sharing
>
> On Mon, 29 Feb 2016 at 22:23 Elias Abacioglu <
> elias.abacioglu@deltaprojects.com> wrote:
>
> > Crap, forgot to remove my signature.. I guess my e-mail will now get
> > spammed forever :(
> >
> >
> >
> >
> >
> > On Mon, Feb 29, 2016 at 3:14 PM, Elias Abacioglu <
> > elias.abacioglu@deltaprojects.com> wrote:
> >
> > > We've setup jmxtrans and use it to check these two values.
> > > UncleanLeaderElectionsPerSec
> > > UnderReplicatedPartitions
> > >
> > > Here is our shinken/nagios configuration:
> > >
> > > define command {
> > >   command_name check_kafka_underreplicated
> > >   command_line $USER1$/check_jmx -U
> > > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> > > "kafka.server":type="ReplicaManager",name="UnderReplicatedPartitions"
> -A
> > > Value -w $ARG1$ -c $ARG2$
> > > }
> > >
> > > define command {
> > >   command_name check_kafka_uncleanleader
> > >   command_line $USER1$/check_jmx -U
> > > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> > >
> >
> "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec"
> > > -A Count -w $ARG1$ -c $ARG2$
> > > }
> > >
> > > define service {
> > >   hostgroup_name KafkaBroker
> > >   use generic-service
> > >   service_description Kafka Unclean Leader Elections per sec
> > >   check_command check_kafka_uncleanleader!1!10
> > >   check_interval 15
> > >   retry_interval 5
> > > }
> > > define service {
> > >   hostgroup_name KafkaBroker
> > >   use generic-service
> > >   service_description Kafka Under Replicated Partitions
> > >   check_command check_kafka_underreplicated!1!10
> > >   check_interval 15
> > >   retry_interval 5
> > > }
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Feb 29, 2016 at 12:41 PM, tao xiao <xi...@gmail.com>
> wrote:
> > >
> > >> Thanks Jens. What I want to achieve is to check every broker within a
> > >> cluster functions probably. The way you suggest can identify the
> > liveness
> > >> of a cluster but it doesn't necessarily mean every broker in the
> cluster
> > >> is
> > >> alive. In order to achieve that I can either create a topic with
> number
> > of
> > >> partitions being same as the number of brokers and
> min.insync.isr=number
> > >> of
> > >> brokers or one topic per broker and then send ping message to broker.
> > But
> > >> this approach is definitely not scalable as we expand the cluster.
> > >> Therefore I am looking for a way to achieve this.
> > >>
> > >> On Mon, 29 Feb 2016 at 16:54 Jens Rantil <je...@tink.se> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I assume you first want to ask yourself what liveness you would like
> > to
> > >> > check for. I guess the most realistic check is to put a "ping"
> message
> > >> on
> > >> > the broken and make sure that you can consume it.
> > >> >
> > >> > Cheers,
> > >> > Jens
> > >> >
> > >> > On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xi...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi team,
> > >> > >
> > >> > > What is the best way to verify a specific Kafka node functions
> > >> properly?
> > >> > > Telnet the port is one of the approach but I don't think it tells
> me
> > >> > > whether or not the broker can still receive/send traffics. I am
> > >> thinking
> > >> > to
> > >> > > ask for metadata from the broker using consumer.partitionsFor. If
> it
> > >> can
> > >> > > return partitioninfo it is considered live. Is this a good
> approach?
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Jens Rantil
> > >> > Backend engineer
> > >> > Tink AB
> > >> >
> > >> > Email: jens.rantil@tink.se
> > >> > Phone: +46 708 84 18 32
> > >> > Web: www.tink.se
> > >> >
> > >> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> > >> > <
> > >> >
> > >>
> >
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> > >> > >
> > >> >  Twitter <https://twitter.com/tink>
> > >> >
> > >>
> > >
> > >
> >
>



-- 
-- Guozhang

Re: Kafka node liveness check

Posted by tao xiao <xi...@gmail.com>.
Thanks Elias for sharing

On Mon, 29 Feb 2016 at 22:23 Elias Abacioglu <
elias.abacioglu@deltaprojects.com> wrote:

> Crap, forgot to remove my signature.. I guess my e-mail will now get
> spammed forever :(
>
>
>
>
>
> On Mon, Feb 29, 2016 at 3:14 PM, Elias Abacioglu <
> elias.abacioglu@deltaprojects.com> wrote:
>
> > We've setup jmxtrans and use it to check these two values.
> > UncleanLeaderElectionsPerSec
> > UnderReplicatedPartitions
> >
> > Here is our shinken/nagios configuration:
> >
> > define command {
> >   command_name check_kafka_underreplicated
> >   command_line $USER1$/check_jmx -U
> > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> > "kafka.server":type="ReplicaManager",name="UnderReplicatedPartitions" -A
> > Value -w $ARG1$ -c $ARG2$
> > }
> >
> > define command {
> >   command_name check_kafka_uncleanleader
> >   command_line $USER1$/check_jmx -U
> > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> >
> "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec"
> > -A Count -w $ARG1$ -c $ARG2$
> > }
> >
> > define service {
> >   hostgroup_name KafkaBroker
> >   use generic-service
> >   service_description Kafka Unclean Leader Elections per sec
> >   check_command check_kafka_uncleanleader!1!10
> >   check_interval 15
> >   retry_interval 5
> > }
> > define service {
> >   hostgroup_name KafkaBroker
> >   use generic-service
> >   service_description Kafka Under Replicated Partitions
> >   check_command check_kafka_underreplicated!1!10
> >   check_interval 15
> >   retry_interval 5
> > }
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 29, 2016 at 12:41 PM, tao xiao <xi...@gmail.com> wrote:
> >
> >> Thanks Jens. What I want to achieve is to check every broker within a
> >> cluster functions probably. The way you suggest can identify the
> liveness
> >> of a cluster but it doesn't necessarily mean every broker in the cluster
> >> is
> >> alive. In order to achieve that I can either create a topic with number
> of
> >> partitions being same as the number of brokers and min.insync.isr=number
> >> of
> >> brokers or one topic per broker and then send ping message to broker.
> But
> >> this approach is definitely not scalable as we expand the cluster.
> >> Therefore I am looking for a way to achieve this.
> >>
> >> On Mon, 29 Feb 2016 at 16:54 Jens Rantil <je...@tink.se> wrote:
> >>
> >> > Hi,
> >> >
> >> > I assume you first want to ask yourself what liveness you would like
> to
> >> > check for. I guess the most realistic check is to put a "ping" message
> >> on
> >> > the broken and make sure that you can consume it.
> >> >
> >> > Cheers,
> >> > Jens
> >> >
> >> > On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xi...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi team,
> >> > >
> >> > > What is the best way to verify a specific Kafka node functions
> >> properly?
> >> > > Telnet the port is one of the approach but I don't think it tells me
> >> > > whether or not the broker can still receive/send traffics. I am
> >> thinking
> >> > to
> >> > > ask for metadata from the broker using consumer.partitionsFor. If it
> >> can
> >> > > return partitioninfo it is considered live. Is this a good approach?
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Jens Rantil
> >> > Backend engineer
> >> > Tink AB
> >> >
> >> > Email: jens.rantil@tink.se
> >> > Phone: +46 708 84 18 32
> >> > Web: www.tink.se
> >> >
> >> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> >> > <
> >> >
> >>
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> >> > >
> >> >  Twitter <https://twitter.com/tink>
> >> >
> >>
> >
> >
>

Re: Kafka node liveness check

Posted by Elias Abacioglu <el...@deltaprojects.com>.
Crap, forgot to remove my signature.. I guess my e-mail will now get
spammed forever :(





On Mon, Feb 29, 2016 at 3:14 PM, Elias Abacioglu <
elias.abacioglu@deltaprojects.com> wrote:

> We've setup jmxtrans and use it to check these two values.
> UncleanLeaderElectionsPerSec
> UnderReplicatedPartitions
>
> Here is our shinken/nagios configuration:
>
> define command {
>   command_name check_kafka_underreplicated
>   command_line $USER1$/check_jmx -U
> service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> "kafka.server":type="ReplicaManager",name="UnderReplicatedPartitions" -A
> Value -w $ARG1$ -c $ARG2$
> }
>
> define command {
>   command_name check_kafka_uncleanleader
>   command_line $USER1$/check_jmx -U
> service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec"
> -A Count -w $ARG1$ -c $ARG2$
> }
>
> define service {
>   hostgroup_name KafkaBroker
>   use generic-service
>   service_description Kafka Unclean Leader Elections per sec
>   check_command check_kafka_uncleanleader!1!10
>   check_interval 15
>   retry_interval 5
> }
> define service {
>   hostgroup_name KafkaBroker
>   use generic-service
>   service_description Kafka Under Replicated Partitions
>   check_command check_kafka_underreplicated!1!10
>   check_interval 15
>   retry_interval 5
> }
>
>
>
>
>
>
>
>
>
>
> On Mon, Feb 29, 2016 at 12:41 PM, tao xiao <xi...@gmail.com> wrote:
>
>> Thanks Jens. What I want to achieve is to check every broker within a
>> cluster functions probably. The way you suggest can identify the liveness
>> of a cluster but it doesn't necessarily mean every broker in the cluster
>> is
>> alive. In order to achieve that I can either create a topic with number of
>> partitions being same as the number of brokers and min.insync.isr=number
>> of
>> brokers or one topic per broker and then send ping message to broker. But
>> this approach is definitely not scalable as we expand the cluster.
>> Therefore I am looking for a way to achieve this.
>>
>> On Mon, 29 Feb 2016 at 16:54 Jens Rantil <je...@tink.se> wrote:
>>
>> > Hi,
>> >
>> > I assume you first want to ask yourself what liveness you would like to
>> > check for. I guess the most realistic check is to put a "ping" message
>> on
>> > the broken and make sure that you can consume it.
>> >
>> > Cheers,
>> > Jens
>> >
>> > On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xi...@gmail.com>
>> wrote:
>> >
>> > > Hi team,
>> > >
>> > > What is the best way to verify a specific Kafka node functions
>> properly?
>> > > Telnet the port is one of the approach but I don't think it tells me
>> > > whether or not the broker can still receive/send traffics. I am
>> thinking
>> > to
>> > > ask for metadata from the broker using consumer.partitionsFor. If it
>> can
>> > > return partitioninfo it is considered live. Is this a good approach?
>> > >
>> >
>> >
>> >
>> > --
>> > Jens Rantil
>> > Backend engineer
>> > Tink AB
>> >
>> > Email: jens.rantil@tink.se
>> > Phone: +46 708 84 18 32
>> > Web: www.tink.se
>> >
>> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>> > <
>> >
>> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
>> > >
>> >  Twitter <https://twitter.com/tink>
>> >
>>
>
>

Re: Kafka node liveness check

Posted by Elias Abacioglu <el...@deltaprojects.com>.
We've setup jmxtrans and use it to check these two values.
UncleanLeaderElectionsPerSec
UnderReplicatedPartitions

Here is our shinken/nagios configuration:

define command {
  command_name check_kafka_underreplicated
  command_line $USER1$/check_jmx -U
service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
"kafka.server":type="ReplicaManager",name="UnderReplicatedPartitions" -A
Value -w $ARG1$ -c $ARG2$
}

define command {
  command_name check_kafka_uncleanleader
  command_line $USER1$/check_jmx -U
service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
"kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec"
-A Count -w $ARG1$ -c $ARG2$
}

define service {
  hostgroup_name KafkaBroker
  use generic-service
  service_description Kafka Unclean Leader Elections per sec
  check_command check_kafka_uncleanleader!1!10
  check_interval 15
  retry_interval 5
}
define service {
  hostgroup_name KafkaBroker
  use generic-service
  service_description Kafka Under Replicated Partitions
  check_command check_kafka_underreplicated!1!10
  check_interval 15
  retry_interval 5
}





################################################################################################################################################################################################################################################################################################################

*DELTA PROJECTS*

*Elias Abacioglu*
Infrastructure Specialist at Delta Projects AB

*E-mail*: elias.abacioglu@deltaprojects.com
*Office*: +46 8 667 76 90 *Mobile*: +46 70 222 59 25
*Office*: Banérgatan 10, SE-115 23 Stockholm, Sweden
website <http://www.deltaprojects.com> | map <http://goo.gl/maps/P3I48> |
support <su...@deltaprojects.com> | twitter
<https://twitter.com/DeltaProjects_> | linkedin
<http://www.linkedin.com/company/delta-projects?trk=hb_tab_compy_id_142164>



On Mon, Feb 29, 2016 at 12:41 PM, tao xiao <xi...@gmail.com> wrote:

> Thanks Jens. What I want to achieve is to check every broker within a
> cluster functions probably. The way you suggest can identify the liveness
> of a cluster but it doesn't necessarily mean every broker in the cluster is
> alive. In order to achieve that I can either create a topic with number of
> partitions being same as the number of brokers and min.insync.isr=number of
> brokers or one topic per broker and then send ping message to broker. But
> this approach is definitely not scalable as we expand the cluster.
> Therefore I am looking for a way to achieve this.
>
> On Mon, 29 Feb 2016 at 16:54 Jens Rantil <je...@tink.se> wrote:
>
> > Hi,
> >
> > I assume you first want to ask yourself what liveness you would like to
> > check for. I guess the most realistic check is to put a "ping" message on
> > the broken and make sure that you can consume it.
> >
> > Cheers,
> > Jens
> >
> > On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xi...@gmail.com> wrote:
> >
> > > Hi team,
> > >
> > > What is the best way to verify a specific Kafka node functions
> properly?
> > > Telnet the port is one of the approach but I don't think it tells me
> > > whether or not the broker can still receive/send traffics. I am
> thinking
> > to
> > > ask for metadata from the broker using consumer.partitionsFor. If it
> can
> > > return partitioninfo it is considered live. Is this a good approach?
> > >
> >
> >
> >
> > --
> > Jens Rantil
> > Backend engineer
> > Tink AB
> >
> > Email: jens.rantil@tink.se
> > Phone: +46 708 84 18 32
> > Web: www.tink.se
> >
> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> > <
> >
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> > >
> >  Twitter <https://twitter.com/tink>
> >
>

Re: Kafka node liveness check

Posted by tao xiao <xi...@gmail.com>.
Thanks Jens. What I want to achieve is to check every broker within a
cluster functions probably. The way you suggest can identify the liveness
of a cluster but it doesn't necessarily mean every broker in the cluster is
alive. In order to achieve that I can either create a topic with number of
partitions being same as the number of brokers and min.insync.isr=number of
brokers or one topic per broker and then send ping message to broker. But
this approach is definitely not scalable as we expand the cluster.
Therefore I am looking for a way to achieve this.

On Mon, 29 Feb 2016 at 16:54 Jens Rantil <je...@tink.se> wrote:

> Hi,
>
> I assume you first want to ask yourself what liveness you would like to
> check for. I guess the most realistic check is to put a "ping" message on
> the broken and make sure that you can consume it.
>
> Cheers,
> Jens
>
> On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xi...@gmail.com> wrote:
>
> > Hi team,
> >
> > What is the best way to verify a specific Kafka node functions properly?
> > Telnet the port is one of the approach but I don't think it tells me
> > whether or not the broker can still receive/send traffics. I am thinking
> to
> > ask for metadata from the broker using consumer.partitionsFor. If it can
> > return partitioninfo it is considered live. Is this a good approach?
> >
>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.rantil@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> <
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> >
>  Twitter <https://twitter.com/tink>
>

Re: Kafka node liveness check

Posted by Jens Rantil <je...@tink.se>.
Hi,

I assume you first want to ask yourself what liveness you would like to
check for. I guess the most realistic check is to put a "ping" message on
the broken and make sure that you can consume it.

Cheers,
Jens

On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xi...@gmail.com> wrote:

> Hi team,
>
> What is the best way to verify a specific Kafka node functions properly?
> Telnet the port is one of the approach but I don't think it tells me
> whether or not the broker can still receive/send traffics. I am thinking to
> ask for metadata from the broker using consumer.partitionsFor. If it can
> return partitioninfo it is considered live. Is this a good approach?
>



-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>