You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Xavier Léauté <xa...@confluent.io> on 2019/10/26 00:17:20 UTC

[DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Hi All,

I wrote a short KIP to make the set of metrics exposed via JMX configurable.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-544%3A+Make+metrics+exposed+via+JMX+configurable

Let me know what you think.

Thanks,
Xavier

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Xavier Léauté <xa...@confluent.io>.

Based on PR feedback I've updated the KIP to align the configs between
clients and brokers.
Broker configs now start with "metrics.xxx" instead of "kafka.metrics.xxx",
in line with clients configs.
This is also more consistent with newer broker configs.

On Fri, Nov 8, 2019 at 12:23 PM Alexandre Dupriez <
alexandre.dupriez@gmail.com> wrote:

> Hello,
>
> This can be very handy when dealing with large numbers of partitions on a
> broker.
>
> I was recently experimenting with a third-party monitoring framework which
> provides a JMX collector [1] with the same mechanism to filter out the JMX
> beans retrieved from Kafka.
> When running a couple of tests with all filters removed, the time to fetch
> all beans could become quickly prohibitive as the number of partitions on
> the tested broker increased.
>
> After some investigation, the main source of "friction" was found in the
> (too) many RMI RPCs required to fetch the names and attributes of the JMX
> beans.
> Configuring the same JMX collector to run as a JVM agent, and taking care
> of unplugging the JMX-RMI connector, yielded significant gains (*).
>
> Note that this was obtained by fetching the beans via HTTP, with all values
> sent in a batch.
> I find one of the potential follow-up mentioned (exposing the beans via an
> alternative API) also very interesting from a performance perspective.
>
> [1] https://github.com/prometheus/jmx_exporter
> (*) On a 4-cores Xeon 8175M broker, hosting 1,000 replicas, the time to
> fetch all beans dropped from 13 seconds to ~400 ms.
>
> Le ven. 8 nov. 2019 à 17:29, Guozhang Wang <wa...@gmail.com> a écrit :
>
> > Sounds good, thanks.
> >
> > Guozhang
> >
> > On Fri, Nov 8, 2019 at 9:26 AM Xavier Léauté <xa...@confluent.io>
> wrote:
> >
> > > >
> > > > 1. I do feel there're similar needs for clients make JMX
> configurable.
> > > Some
> > > > context: in modules like Connect and Streams we have added /
> > refactored a
> > > > large number of metrics so far [0, 1], and although we've added a
> > > reporting
> > > > level config [2] to clients, this is statically defined at code and
> > > cannot
> > > > be dynamically changed either.
> > > >
> > >
> > > Thanks for providing some context there, I have updated the KIP to add
> > > equivalent configs for clients, streams, and connect
> > >
> > >
> > > > 2. This may be out of the scope of this KIP, but have you thought
> about
> > > how
> > > > to make the metrics collection to be configurable (i.e. basically for
> > > those
> > > > metrics which we know would not be exposed, we do not collect them
> > > either)
> > > > dynamically?
> > >
> > >
> > > Yes, given what you described above, it would make sense to look into
> > this.
> > > One difficulty though, is that we'd probably want to define this at the
> > > sensor level,
> > > which does not always map to the metric names users understand.
> > >
> > > There are also cases where someone may want to expose different sets of
> > > metrics
> > > using different reporters, so I think a reporting level config is still
> > > useful.
> > > For this KIP, I am proposing we stick to making reporting configurable,
> > > independent of the underlying collection mechanism.
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Alexandre Dupriez <al...@gmail.com>.

Hello,

This can be very handy when dealing with large numbers of partitions on a
broker.

I was recently experimenting with a third-party monitoring framework which
provides a JMX collector [1] with the same mechanism to filter out the JMX
beans retrieved from Kafka.
When running a couple of tests with all filters removed, the time to fetch
all beans could become quickly prohibitive as the number of partitions on
the tested broker increased.

After some investigation, the main source of "friction" was found in the
(too) many RMI RPCs required to fetch the names and attributes of the JMX
beans.
Configuring the same JMX collector to run as a JVM agent, and taking care
of unplugging the JMX-RMI connector, yielded significant gains (*).

Note that this was obtained by fetching the beans via HTTP, with all values
sent in a batch.
I find one of the potential follow-up mentioned (exposing the beans via an
alternative API) also very interesting from a performance perspective.

[1] https://github.com/prometheus/jmx_exporter
(*) On a 4-cores Xeon 8175M broker, hosting 1,000 replicas, the time to
fetch all beans dropped from 13 seconds to ~400 ms.

Le ven. 8 nov. 2019 à 17:29, Guozhang Wang <wa...@gmail.com> a écrit :

> Sounds good, thanks.
>
> Guozhang
>
> On Fri, Nov 8, 2019 at 9:26 AM Xavier Léauté <xa...@confluent.io> wrote:
>
> > >
> > > 1. I do feel there're similar needs for clients make JMX configurable.
> > Some
> > > context: in modules like Connect and Streams we have added /
> refactored a
> > > large number of metrics so far [0, 1], and although we've added a
> > reporting
> > > level config [2] to clients, this is statically defined at code and
> > cannot
> > > be dynamically changed either.
> > >
> >
> > Thanks for providing some context there, I have updated the KIP to add
> > equivalent configs for clients, streams, and connect
> >
> >
> > > 2. This may be out of the scope of this KIP, but have you thought about
> > how
> > > to make the metrics collection to be configurable (i.e. basically for
> > those
> > > metrics which we know would not be exposed, we do not collect them
> > either)
> > > dynamically?
> >
> >
> > Yes, given what you described above, it would make sense to look into
> this.
> > One difficulty though, is that we'd probably want to define this at the
> > sensor level,
> > which does not always map to the metric names users understand.
> >
> > There are also cases where someone may want to expose different sets of
> > metrics
> > using different reporters, so I think a reporting level config is still
> > useful.
> > For this KIP, I am proposing we stick to making reporting configurable,
> > independent of the underlying collection mechanism.
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Guozhang Wang <wa...@gmail.com>.

Sounds good, thanks.

Guozhang

On Fri, Nov 8, 2019 at 9:26 AM Xavier Léauté <xa...@confluent.io> wrote:

> >
> > 1. I do feel there're similar needs for clients make JMX configurable.
> Some
> > context: in modules like Connect and Streams we have added / refactored a
> > large number of metrics so far [0, 1], and although we've added a
> reporting
> > level config [2] to clients, this is statically defined at code and
> cannot
> > be dynamically changed either.
> >
>
> Thanks for providing some context there, I have updated the KIP to add
> equivalent configs for clients, streams, and connect
>
>
> > 2. This may be out of the scope of this KIP, but have you thought about
> how
> > to make the metrics collection to be configurable (i.e. basically for
> those
> > metrics which we know would not be exposed, we do not collect them
> either)
> > dynamically?
>
>
> Yes, given what you described above, it would make sense to look into this.
> One difficulty though, is that we'd probably want to define this at the
> sensor level,
> which does not always map to the metric names users understand.
>
> There are also cases where someone may want to expose different sets of
> metrics
> using different reporters, so I think a reporting level config is still
> useful.
> For this KIP, I am proposing we stick to making reporting configurable,
> independent of the underlying collection mechanism.
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Xavier Léauté <xa...@confluent.io>.

>
> 1. I do feel there're similar needs for clients make JMX configurable. Some
> context: in modules like Connect and Streams we have added / refactored a
> large number of metrics so far [0, 1], and although we've added a reporting
> level config [2] to clients, this is statically defined at code and cannot
> be dynamically changed either.
>

Thanks for providing some context there, I have updated the KIP to add
equivalent configs for clients, streams, and connect


> 2. This may be out of the scope of this KIP, but have you thought about how
> to make the metrics collection to be configurable (i.e. basically for those
> metrics which we know would not be exposed, we do not collect them either)
> dynamically?


Yes, given what you described above, it would make sense to look into this.
One difficulty though, is that we'd probably want to define this at the
sensor level,
which does not always map to the metric names users understand.

There are also cases where someone may want to expose different sets of
metrics
using different reporters, so I think a reporting level config is still
useful.
For this KIP, I am proposing we stick to making reporting configurable,
independent of the underlying collection mechanism.

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Guozhang Wang <wa...@gmail.com>.

Thanks Xavier for the KIP, I think it is a great idea to add to AK. A
couple of questions / comments:

1. I do feel there're similar needs for clients make JMX configurable. Some
context: in modules like Connect and Streams we have added / refactored a
large number of metrics so far [0, 1], and although we've added a reporting
level config [2] to clients, this is statically defined at code and cannot
be dynamically changed either.

[0]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework
[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams
[2]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-105%3A+Addition+of+Recording+Level+for+Sensors

2. This may be out of the scope of this KIP, but have you thought about how
to make the metrics collection to be configurable (i.e. basically for those
metrics which we know would not be exposed, we do not collect them either)
dynamically? Again this is related to our previous approach in [2] but it
is static and also quite coarse-grained: you can only turn on ALL
debug-level sensors or none of them, for example. For some sensors the
collection itself can actually be quite expensive since it is either on
critical code path, or it is relying on third-party library calls. If we
can dynamically change what metrics we would collect at runtime that would
be great.

Guozhang

On Wed, Nov 6, 2019 at 3:42 PM Xavier Léauté <xa...@confluent.io> wrote:

> >
> > Since these configs will work with Kafka's own metrics library, will the
> > configs be part of the clients' configurations? It would be good to point
> > that out explicitly in the KIP.
> >
>
> Those configs are currently only at the broker level. If we feel this is
> useful on the client as well, we could submit a similar KIP for the client
> side.
> I am not sure if the number of metrics is as problematic on the client side
> as on the server though.
>
>
> > Would the regex apply to the whole string? i.e would we be able to match
> > parts of the string like `type=`, `name=`, `topic=`, or would it only
> apply
> > to the values?
> >
>
> Yes, to keep things simple I decided to apply the regex to the entire JMX
> mbean string, since that's typically how a user would refer to those
> metrics, regardless of whether it came from KafkaMetrics or YammerMetrics
>
> I've updated the KIP to add some example filters
>

-- 
-- Guozhang

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Xavier Léauté <xa...@confluent.io>.

>
> Since these configs will work with Kafka's own metrics library, will the
> configs be part of the clients' configurations? It would be good to point
> that out explicitly in the KIP.
>

Those configs are currently only at the broker level. If we feel this is
useful on the client as well, we could submit a similar KIP for the client
side.
I am not sure if the number of metrics is as problematic on the client side
as on the server though.


> Would the regex apply to the whole string? i.e would we be able to match
> parts of the string like `type=`, `name=`, `topic=`, or would it only apply
> to the values?
>

Yes, to keep things simple I decided to apply the regex to the entire JMX
mbean string, since that's typically how a user would refer to those
metrics, regardless of whether it came from KafkaMetrics or YammerMetrics

I've updated the KIP to add some example filters

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Stanislav Kozlovski <st...@confluent.io>.

Hey Xavier,

Thank you for working on this. This KIP looks very good to me.

Since these configs will work with Kafka's own metrics library, will the
configs be part of the clients' configurations? It would be good to point
that out explicitly in the KIP.

I also second Viktor's question on what this would look like in practice.
i.e if we had the following metric:

*kafka.cluster:type=Partition,name=UnderMinIsr,topic=foobar,partition=9*

Would the regex apply to the whole string? i.e would we be able to match
parts of the string like `type=`, `name=`, `topic=`, or would it only apply
to the values?

Best,
Stanislav

On Mon, Oct 28, 2019 at 9:06 AM Viktor Somogyi-Vass <vi...@gmail.com>
wrote:

> Hi Xavier,
>
> How would the practical application look like if this was implemented?
> Would monitoring agents switch between the whitelist and blacklist
> periodically if they wanted to monitor every metrics?
> I think we should make some usage recommendations.
>
> Thanks,
> Viktor
>
> On Sun, Oct 27, 2019 at 3:34 PM Gwen Shapira <gw...@confluent.io> wrote:
>
> > Thanks Xavier.
> >
> > I really like this proposal. Collecting JMX metrics in clusters with
> > 100K partitions was nearly impossible due to the design of JMX and the
> > single lock mechanism. Yammer's limitations meant that any metric we
> > reported was exposed via JMX, so we couldn't have cheaper reporters
> > export one set of metrics, and JMX export another.
> >
> > Your proposal looks like a great way to lift this limitation and give
> > us more flexibility in reporting metrics.
> >
> > Gwen
> >
> > On Fri, Oct 25, 2019 at 5:17 PM Xavier Léauté <xa...@confluent.io>
> wrote:
> > >
> > > Hi All,
> > >
> > > I wrote a short KIP to make the set of metrics exposed via JMX
> > configurable.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-544%3A+Make+metrics+exposed+via+JMX+configurable
> > >
> > > Let me know what you think.
> > >
> > > Thanks,
> > > Xavier
> >
>

-- 
Best,
Stanislav

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.

Hi Xavier,

That's certainly an option, however it does not solve the problem for our
users that still rely on JMX integration to collect metrics.

Absolutely.

We already provide the ability to write reporter plugins via the
MetricsReporter interface.
And rather than building a separate HTTP interface, I think we should
extend the MetricsReporter interface to also
provide access to yammer metrics – not just Kafka metrics – since there is
no clear effort to move away from Yammer at this time.

This way one could build any kind of reporter – HTTP or otherwise – without
having to rely on Kafka internal classes

Yes, as you point it out it's important decouple the metric reporter from
internal classes and for this exposing Yammer would be a good step.
From this perspective the REST API goes one step further as you won't have
to ship the broker and the reporter plugin together.
Anyway, don't want to derail the conversation here with the REST stuff
(perhaps I'll open a KIP for that sometime and we can discuss it there :) ).

Thanks,
Viktor

On Wed, Oct 30, 2019 at 10:44 PM Xavier Léauté <xa...@confluent.io> wrote:

> >
> > A follow-up question, maybe to list in the future work section as it's
> > somewhat parallel to this KIP: have you thought about implementing a REST
> > reporter for metrics?
>
>
> That's certainly an option, however it does not solve the problem for our
> users that still rely on JMX integration to collect metrics.
>
> We already provide the ability to write reporter plugins via the
> MetricsReporter interface.
> And rather than building a separate HTTP interface, I think we should
> extend the MetricsReporter interface to also
> provide access to yammer metrics – not just Kafka metrics – since there is
> no clear effort to move away from Yammer at this time.
>
> This way one could build any kind of reporter – HTTP or otherwise – without
> having to rely on Kafka internal classes
>

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Xavier Léauté <xa...@confluent.io>.

>
> A follow-up question, maybe to list in the future work section as it's
> somewhat parallel to this KIP: have you thought about implementing a REST
> reporter for metrics?


That's certainly an option, however it does not solve the problem for our
users that still rely on JMX integration to collect metrics.

We already provide the ability to write reporter plugins via the
MetricsReporter interface.
And rather than building a separate HTTP interface, I think we should
extend the MetricsReporter interface to also
provide access to yammer metrics – not just Kafka metrics – since there is
no clear effort to move away from Yammer at this time.

This way one could build any kind of reporter – HTTP or otherwise – without
having to rely on Kafka internal classes

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.

Hey Xavier,

Thanks for the explanation.
A follow-up question, maybe to list in the future work section as it's
somewhat parallel to this KIP: have you thought about implementing a REST
reporter for metrics? In my opinion it would be useful as you can write it
so that you directly query the registry so we'd avoid this problem.

Thanks,
Viktor

On Wed, Oct 30, 2019 at 4:13 AM Xavier Léauté <xa...@confluent.io> wrote:

> >
> > How would the practical application look like if this was implemented?
> >
>
> One useful application is to hide partition-level metrics, some of which
> may only be needed for debugging purposes.
>
>
> > Would monitoring agents switch between the whitelist and blacklist
> > periodically if they wanted to monitor every metrics?
> >
>
> I'm not sure if switching periodically would be practical. However, I do
> see cases where one might want to enable a subset of metrics temporarily
> for debugging, without incurring the need to expose all metrics all the
> time.
>
> I can certainly add some examples regular expressions to the KIP to
> illustrate this.
>

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Xavier Léauté <xa...@confluent.io>.

>
> How would the practical application look like if this was implemented?
>

One useful application is to hide partition-level metrics, some of which
may only be needed for debugging purposes.


> Would monitoring agents switch between the whitelist and blacklist
> periodically if they wanted to monitor every metrics?
>

I'm not sure if switching periodically would be practical. However, I do
see cases where one might want to enable a subset of metrics temporarily
for debugging, without incurring the need to expose all metrics all the
time.

I can certainly add some examples regular expressions to the KIP to
illustrate this.

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.

Hi Xavier,

How would the practical application look like if this was implemented?
Would monitoring agents switch between the whitelist and blacklist
periodically if they wanted to monitor every metrics?
I think we should make some usage recommendations.

Thanks,
Viktor

On Sun, Oct 27, 2019 at 3:34 PM Gwen Shapira <gw...@confluent.io> wrote:

> Thanks Xavier.
>
> I really like this proposal. Collecting JMX metrics in clusters with
> 100K partitions was nearly impossible due to the design of JMX and the
> single lock mechanism. Yammer's limitations meant that any metric we
> reported was exposed via JMX, so we couldn't have cheaper reporters
> export one set of metrics, and JMX export another.
>
> Your proposal looks like a great way to lift this limitation and give
> us more flexibility in reporting metrics.
>
> Gwen
>
> On Fri, Oct 25, 2019 at 5:17 PM Xavier Léauté <xa...@confluent.io> wrote:
> >
> > Hi All,
> >
> > I wrote a short KIP to make the set of metrics exposed via JMX
> configurable.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-544%3A+Make+metrics+exposed+via+JMX+configurable
> >
> > Let me know what you think.
> >
> > Thanks,
> > Xavier
>

Re: [DISCUSS] KIP-544: Make metrics exposed via JMX configurable

Posted by Gwen Shapira <gw...@confluent.io>.

Thanks Xavier.

I really like this proposal. Collecting JMX metrics in clusters with
100K partitions was nearly impossible due to the design of JMX and the
single lock mechanism. Yammer's limitations meant that any metric we
reported was exposed via JMX, so we couldn't have cheaper reporters
export one set of metrics, and JMX export another.

Your proposal looks like a great way to lift this limitation and give
us more flexibility in reporting metrics.

Gwen

On Fri, Oct 25, 2019 at 5:17 PM Xavier Léauté <xa...@confluent.io> wrote:
>
> Hi All,
>
> I wrote a short KIP to make the set of metrics exposed via JMX configurable.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-544%3A+Make+metrics+exposed+via+JMX+configurable
>
> Let me know what you think.
>
> Thanks,
> Xavier