You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Bruno Cadonna <br...@confluent.io> on 2020/09/30 11:48:57 UTC

[DISCUSS] KIP-674: API to Aggregate Metrics in Kafka Streams

Hi,

I would like to propose the following KIP to add an API to the Kafka 
Streams client to aggregate metrics.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-674%3A+API+to+Aggregate+Metrics+in+Kafka+Streams

Best,
Bruno

Re: [DISCUSS] KIP-674: API to Aggregate Metrics in Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.

Hello Bruno,

Thanks for the proposed KIP! Here are some thoughts:

1) Regarding Sophie's comment 1), I think there may be one merit to
defining the aggregated metric on a different level (though only on a
higher level would generally make sense). Although in order for the
aggregated metric to report any values, we are still bound to the reporting
level of its underlying metrics, but for the metrics reporter we can
potentially leverage on the level, to define e.g. that "only collect
metrics at the INFO level". Today since the reporting level is not exposed
from Metric yet, we cannot do that. Without this meric, I feel it indeed
makes less sense to allow users defining the level of the aggregated metric
to be different from the underlying metrics' level.

2) Regarding the rejected alternative, personally I'm still a bit hesitant
to put a metrics-related API at the entry class KafkaStreams. WDYT about
adding an API to expose StreamsMetrics in KafkaStreams, and add this
function in StreamsMetricsImpl, which would check if the state of the
client is in CREATED, otherwise throw an illegal state exception to avoid
concurrency? We can pass in the KafkaStreams reference to the
StreamsMetricsImpl constructor so that its state can be accessed in this
function.

Guozhang

On Wed, Sep 30, 2020 at 5:34 PM Sophie Blee-Goldman <so...@confluent.io>
wrote:

> Thanks for the KIP! This looks really useful
>
> 1)
>
> > If the recording level for the application is set to INFO, a DEBUG-level
> > metric that should be aggregated will
> > not record values even if the metrics that records the aggregation is on
> > recording level INFO
>
>
> I think I'm missing something here: how is it possible for there to be a
> "DEBUG-level metric
> that should be aggregated" and yet "the metrics that records the
> aggregation is on INFO" ?
> I thought that each metric to be aggregated would be uniquely defined by
> the combination
> of metricName + metricGroup, so there wouldn't be both an INFO and DEBUG
> level
> metric  to distinguish between.
>
> It seems like if a user tries to aggregate a DEBUG-level metric but passes
> in the recording
> level as INFO, this would be a misconfiguration. Can you give an example to
> help me
> understand this?
>
> 2) Why do we require users to supply an aggregator and initializer?
> Shouldn't
> that be determined solely by the actual metric being aggregated? For
> example
> if a user wants to aggregate the `commit-latency-avg`, then the only
> sensible
> way to aggregate this is by averaging. Similarly the only sensible
> aggregation
> for `commit-latency-max` is to take the max, a `-total` metric should be
> summed,
> and so on.
>
> If it's not always possible to infer the aggregation type, then WDYT about
> providing
> a few aggregation options rather than a fully pluggable BiFunction that
> users have
> to implement themselves? It seems like the vast majority of aggregations
> would be one of {avg, min, max, sum}, so we should just give users those
> options
> directly. For. example instead of the `initalAggregate` and
> `aggregateFunction` parameters,
> we just have an enum with possible values AVERAGE, MIN, MAX, and SUM
> (am I missing any in that list?)
>
> If a user really wants to do something more complicated, they can always
> roll it
> up themselves. And if people ask for it we can also go back and add the
> pluggable
> BiFunction option as well. I'd just rather keep things as simple for users
> as possible,
> until we've heard actual feedback that more complicated options are
> desired.
>
> 3) nit: I feel the name `tagLabels` is a bit subtle, what about
> `tagsToAggregate`?
> (just a suggestion, feel free to ignore)
>
> 4) Also a bit of a nit: why put all the configs in the
> MetricsAggregationConfig class,
> except for the groupOfMetricsToAggregate and nameOfMetricsToAggregate? It
> seems like they are just as much a config as the `tagLabels`, for example.
>
> WDYT?
>
> Sophie
>
> On Wed, Sep 30, 2020 at 4:49 AM Bruno Cadonna <br...@confluent.io> wrote:
>
> > Hi,
> >
> > I would like to propose the following KIP to add an API to the Kafka
> > Streams client to aggregate metrics.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-674%3A+API+to+Aggregate+Metrics+in+Kafka+Streams
> >
> > Best,
> > Bruno
> >
>

-- 
-- Guozhang

Re: [DISCUSS] KIP-674: API to Aggregate Metrics in Kafka Streams

Posted by Sophie Blee-Goldman <so...@confluent.io>.

Thanks for the KIP! This looks really useful

1)

> If the recording level for the application is set to INFO, a DEBUG-level
> metric that should be aggregated will
> not record values even if the metrics that records the aggregation is on
> recording level INFO

I think I'm missing something here: how is it possible for there to be a
"DEBUG-level metric
that should be aggregated" and yet "the metrics that records the
aggregation is on INFO" ?
I thought that each metric to be aggregated would be uniquely defined by
the combination
of metricName + metricGroup, so there wouldn't be both an INFO and DEBUG
level
metric  to distinguish between.

It seems like if a user tries to aggregate a DEBUG-level metric but passes
in the recording
level as INFO, this would be a misconfiguration. Can you give an example to
help me
understand this?

2) Why do we require users to supply an aggregator and initializer?
Shouldn't
that be determined solely by the actual metric being aggregated? For example
if a user wants to aggregate the `commit-latency-avg`, then the only
sensible
way to aggregate this is by averaging. Similarly the only sensible
aggregation
for `commit-latency-max` is to take the max, a `-total` metric should be
summed,
and so on.

If it's not always possible to infer the aggregation type, then WDYT about
providing
a few aggregation options rather than a fully pluggable BiFunction that
users have
to implement themselves? It seems like the vast majority of aggregations
would be one of {avg, min, max, sum}, so we should just give users those
options
directly. For. example instead of the `initalAggregate` and
`aggregateFunction` parameters,
we just have an enum with possible values AVERAGE, MIN, MAX, and SUM
(am I missing any in that list?)

If a user really wants to do something more complicated, they can always
roll it
up themselves. And if people ask for it we can also go back and add the
pluggable
BiFunction option as well. I'd just rather keep things as simple for users
as possible,
until we've heard actual feedback that more complicated options are
desired.

3) nit: I feel the name `tagLabels` is a bit subtle, what about
`tagsToAggregate`?
(just a suggestion, feel free to ignore)

4) Also a bit of a nit: why put all the configs in the
MetricsAggregationConfig class,
except for the groupOfMetricsToAggregate and nameOfMetricsToAggregate? It
seems like they are just as much a config as the `tagLabels`, for example.

WDYT?

Sophie

On Wed, Sep 30, 2020 at 4:49 AM Bruno Cadonna <br...@confluent.io> wrote:

> Hi,
>
> I would like to propose the following KIP to add an API to the Kafka
> Streams client to aggregate metrics.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-674%3A+API+to+Aggregate+Metrics+in+Kafka+Streams
>
> Best,
> Bruno
>