You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2019/08/02 20:07:02 UTC

Re: [DISCUSS] KIP-444: Augment Metrics for Kafka Streams

Hello Matthias,

That's a good question. Thinking about a bit more, I'd like to propose that
we just list all the version numbers since 0.10 to 2.4 as accepted values,
and let Streams to decide if old / new set of metrics can be used
internally (implementation wise we can reuse the const values for
`upgrade.from` as well).

And then in the future when we remove the metrics, we can just remove the
corresponding version values from the accepted list of this config.


Guozhang

On Tue, Jul 23, 2019 at 11:55 AM Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for the KIP Guozhang.
>
> I just re-read the wiki page and the DISCUSS thread. Overall LGTM.
>
> The only nit is the naming of the new config values. With AK 2.3 being
> released the versions numbers needs to be updated.
>
> Additionally, I actually think that "2.2-" and "2.3" are not the best
> names: the `-` suffix is very subtle IMHO and actually looks more like a
> typo, and it might be better to be more elaborate. Maybe something like
> "up-to-2.2" ?
>
> For "2.3", this config value would be weird for future releases (ie,
> 2.4, 2.5, 2.6). Hence, we might want to rename it to "newest" /
> "current" or something like this?
>
> Another alternative may be to rename it to "since-2.3" (or similar) --
> however, this may require to rename the config if we change metrics in a
> future release (hence, it's not my preferred option).
>
> Thoughts?
>
>
> -Matthias
>
> On 7/22/19 6:33 PM, Guozhang Wang wrote:
> > Thanks everyone for your inputs, I've updated the wiki page accordingly.
> >
> > @Bruno: please let me know if you have any further thoughts per my
> replies
> > above.
> >
> >
> > Guozhang
> >
> >
> > On Mon, Jul 22, 2019 at 6:30 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> Thanks Boyang,
> >>
> >> I've thought about exposing time via metrics in Streams. The tricky part
> >> though is which layer of time we should expose: right now we have
> >> task-level and partition-level stream time (what you suggested), and
> also
> >> some processor internally maintain their own observed time. Today we are
> >> still trying to get a clear and simple way of exposing a single time
> >> concept for users to reason about their application's progress. So
> before
> >> we come up with a good solution I'd postpone adding it in a future KIP.
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Thu, Jul 18, 2019 at 1:21 PM Boyang Chen <reluctanthero104@gmail.com
> >
> >> wrote:
> >>
> >>> I mean the partition time.
> >>>
> >>> On Thu, Jul 18, 2019 at 11:29 AM Guozhang Wang <wa...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Boyang,
> >>>>
> >>>> What do you mean by `per partition latency`?
> >>>>
> >>>> Guozhang
> >>>>
> >>>> On Mon, Jul 1, 2019 at 9:28 AM Boyang Chen <
> reluctanthero104@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hey Guozhang,
> >>>>>
> >>>>> do we plan to add per partition latency in this KIP?
> >>>>>
> >>>>> On Mon, Jul 1, 2019 at 7:08 AM Bruno Cadonna <br...@confluent.io>
> >>> wrote:
> >>>>>
> >>>>>> Hi Guozhang,
> >>>>>>
> >>>>>> Thank you for the KIP.
> >>>>>>
> >>>>>> 1) As far as I understand, the StreamsMetrics interface is there for
> >>>>>> user-defined processors. Would it make sense to also add a method to
> >>>>>> the interface to specify a sensor that records skipped records?
> >>>>>>
> >>>>>> 2) What are the semantics of active-task-process and
> >>>> standby-task-process
> >>>>>>
> >>>>>> 3) How do dropped-late-records and expired-window-record-drop relate
> >>>>>> to each other? I guess the former is for records that fall outside
> >>> the
> >>>>>> grace period and the latter is for records that are processed after
> >>>>>> the retention period of the window. Is this correct?
> >>>>>>
> >>>>>> 4) Is there an actual difference between skipped and dropped
> >>> records?
> >>>>>> If not, shall we unify the terminology?
> >>>>>>
> >>>>>> 5) What happens with removed metrics when the user sets the version
> >>> of
> >>>>>> "built.in.metrics.version" to 2.2-
> >>>>>>
> >>>>>> Best,
> >>>>>> Bruno
> >>>>>>
> >>>>>> On Thu, Jun 27, 2019 at 6:11 PM Guozhang Wang <wa...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Hello folks,
> >>>>>>>
> >>>>>>> As 2.3 is released now, I'd like to bump up this KIP discussion
> >>> again
> >>>>> for
> >>>>>>> your reviews.
> >>>>>>>
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, May 23, 2019 at 4:44 PM Guozhang Wang <wangguoz@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello Patrik,
> >>>>>>>>
> >>>>>>>> Since we are rolling out 2.3 and everyone is busy with the
> >>> release
> >>>>> now
> >>>>>>>> this KIP does not have much discussion involved yet and will
> >>> slip
> >>>>> into
> >>>>>> the
> >>>>>>>> next release cadence.
> >>>>>>>>
> >>>>>>>> This KIP itself contains several parts itself: 1. refactoring
> >>> the
> >>>>>> existing
> >>>>>>>> metrics hierarchy to cleanup some redundancy and also get more
> >>>>>> clarity; 2.
> >>>>>>>> add instance-level metrics like rebalance and state metrics, as
> >>>> well
> >>>>> as
> >>>>>>>> other static metrics.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, May 23, 2019 at 5:34 AM Patrik Kleindl <
> >>> pkleindl@gmail.com
> >>>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Guozhang
> >>>>>>>>> Thanks for the KIP, this looks very helpful.
> >>>>>>>>> Could you please provide more detail on the metrics planned for
> >>>> the
> >>>>>> state?
> >>>>>>>>> We were just considering how to implement this ourselves
> >>> because
> >>>> we
> >>>>>> need
> >>>>>>>>> to
> >>>>>>>>> track the history of stage changes.
> >>>>>>>>> The idea was to have an accumulated "seconds in state x" metric
> >>>> for
> >>>>>> every
> >>>>>>>>> state.
> >>>>>>>>> The new rebalance metric might solve part of our use case, but
> >>> it
> >>>> is
> >>>>>>>>> interesting what you have planned for the state metric.
> >>>>>>>>> best regards
> >>>>>>>>> Patrik
> >>>>>>>>>
> >>>>>>>>> On Fri, 29 Mar 2019 at 18:56, Guozhang Wang <
> >>> wangguoz@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello folks,
> >>>>>>>>>>
> >>>>>>>>>> I'd like to propose the following KIP to improve the Kafka
> >>>> Streams
> >>>>>>>>> metrics
> >>>>>>>>>> mechanism to users. This includes 1) a minor change in the
> >>>> public
> >>>>>>>>>> StreamsMetrics API, and 2) a major cleanup on the Streams'
> >>> own
> >>>>>> built-in
> >>>>>>>>>> metrics hierarchy.
> >>>>>>>>>>
> >>>>>>>>>> Details can be found here:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams
> >>>>>>>>>>
> >>>>>>>>>> I'd love to hear your thoughts and feedbacks. Thanks!
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> -- Guozhang
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> -- Guozhang
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> -- Guozhang
> >>>>
> >>>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
>
>

-- 
-- Guozhang

Re: [DISCUSS] KIP-444: Augment Metrics for Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.
For the existing releases, yes (with KIP-447 we are already going to do
that anyways), for future release maybe not --- hopefully we only do such
metrics refactoring once.


Guozhang

On Fri, Aug 2, 2019 at 3:23 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> Would this imply that we need to update the config in each release to
> add a new accepted value?
>
>
> -Matthias
>
> On 8/2/19 1:07 PM, Guozhang Wang wrote:
> > Hello Matthias,
> >
> > That's a good question. Thinking about a bit more, I'd like to propose
> that
> > we just list all the version numbers since 0.10 to 2.4 as accepted
> values,
> > and let Streams to decide if old / new set of metrics can be used
> > internally (implementation wise we can reuse the const values for
> > `upgrade.from` as well).
> >
> > And then in the future when we remove the metrics, we can just remove the
> > corresponding version values from the accepted list of this config.
> >
> >
> > Guozhang
> >
> > On Tue, Jul 23, 2019 at 11:55 AM Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> Thanks for the KIP Guozhang.
> >>
> >> I just re-read the wiki page and the DISCUSS thread. Overall LGTM.
> >>
> >> The only nit is the naming of the new config values. With AK 2.3 being
> >> released the versions numbers needs to be updated.
> >>
> >> Additionally, I actually think that "2.2-" and "2.3" are not the best
> >> names: the `-` suffix is very subtle IMHO and actually looks more like a
> >> typo, and it might be better to be more elaborate. Maybe something like
> >> "up-to-2.2" ?
> >>
> >> For "2.3", this config value would be weird for future releases (ie,
> >> 2.4, 2.5, 2.6). Hence, we might want to rename it to "newest" /
> >> "current" or something like this?
> >>
> >> Another alternative may be to rename it to "since-2.3" (or similar) --
> >> however, this may require to rename the config if we change metrics in a
> >> future release (hence, it's not my preferred option).
> >>
> >> Thoughts?
> >>
> >>
> >> -Matthias
> >>
> >> On 7/22/19 6:33 PM, Guozhang Wang wrote:
> >>> Thanks everyone for your inputs, I've updated the wiki page
> accordingly.
> >>>
> >>> @Bruno: please let me know if you have any further thoughts per my
> >> replies
> >>> above.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Mon, Jul 22, 2019 at 6:30 PM Guozhang Wang <wa...@gmail.com>
> >> wrote:
> >>>
> >>>> Thanks Boyang,
> >>>>
> >>>> I've thought about exposing time via metrics in Streams. The tricky
> part
> >>>> though is which layer of time we should expose: right now we have
> >>>> task-level and partition-level stream time (what you suggested), and
> >> also
> >>>> some processor internally maintain their own observed time. Today we
> are
> >>>> still trying to get a clear and simple way of exposing a single time
> >>>> concept for users to reason about their application's progress. So
> >> before
> >>>> we come up with a good solution I'd postpone adding it in a future
> KIP.
> >>>>
> >>>>
> >>>> Guozhang
> >>>>
> >>>>
> >>>> On Thu, Jul 18, 2019 at 1:21 PM Boyang Chen <
> reluctanthero104@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> I mean the partition time.
> >>>>>
> >>>>> On Thu, Jul 18, 2019 at 11:29 AM Guozhang Wang <wa...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Boyang,
> >>>>>>
> >>>>>> What do you mean by `per partition latency`?
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Mon, Jul 1, 2019 at 9:28 AM Boyang Chen <
> >> reluctanthero104@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hey Guozhang,
> >>>>>>>
> >>>>>>> do we plan to add per partition latency in this KIP?
> >>>>>>>
> >>>>>>> On Mon, Jul 1, 2019 at 7:08 AM Bruno Cadonna <br...@confluent.io>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Guozhang,
> >>>>>>>>
> >>>>>>>> Thank you for the KIP.
> >>>>>>>>
> >>>>>>>> 1) As far as I understand, the StreamsMetrics interface is there
> for
> >>>>>>>> user-defined processors. Would it make sense to also add a method
> to
> >>>>>>>> the interface to specify a sensor that records skipped records?
> >>>>>>>>
> >>>>>>>> 2) What are the semantics of active-task-process and
> >>>>>> standby-task-process
> >>>>>>>>
> >>>>>>>> 3) How do dropped-late-records and expired-window-record-drop
> relate
> >>>>>>>> to each other? I guess the former is for records that fall outside
> >>>>> the
> >>>>>>>> grace period and the latter is for records that are processed
> after
> >>>>>>>> the retention period of the window. Is this correct?
> >>>>>>>>
> >>>>>>>> 4) Is there an actual difference between skipped and dropped
> >>>>> records?
> >>>>>>>> If not, shall we unify the terminology?
> >>>>>>>>
> >>>>>>>> 5) What happens with removed metrics when the user sets the
> version
> >>>>> of
> >>>>>>>> "built.in.metrics.version" to 2.2-
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Bruno
> >>>>>>>>
> >>>>>>>> On Thu, Jun 27, 2019 at 6:11 PM Guozhang Wang <wangguoz@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hello folks,
> >>>>>>>>>
> >>>>>>>>> As 2.3 is released now, I'd like to bump up this KIP discussion
> >>>>> again
> >>>>>>> for
> >>>>>>>>> your reviews.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Guozhang
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, May 23, 2019 at 4:44 PM Guozhang Wang <
> wangguoz@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello Patrik,
> >>>>>>>>>>
> >>>>>>>>>> Since we are rolling out 2.3 and everyone is busy with the
> >>>>> release
> >>>>>>> now
> >>>>>>>>>> this KIP does not have much discussion involved yet and will
> >>>>> slip
> >>>>>>> into
> >>>>>>>> the
> >>>>>>>>>> next release cadence.
> >>>>>>>>>>
> >>>>>>>>>> This KIP itself contains several parts itself: 1. refactoring
> >>>>> the
> >>>>>>>> existing
> >>>>>>>>>> metrics hierarchy to cleanup some redundancy and also get more
> >>>>>>>> clarity; 2.
> >>>>>>>>>> add instance-level metrics like rebalance and state metrics, as
> >>>>>> well
> >>>>>>> as
> >>>>>>>>>> other static metrics.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 23, 2019 at 5:34 AM Patrik Kleindl <
> >>>>> pkleindl@gmail.com
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Guozhang
> >>>>>>>>>>> Thanks for the KIP, this looks very helpful.
> >>>>>>>>>>> Could you please provide more detail on the metrics planned for
> >>>>>> the
> >>>>>>>> state?
> >>>>>>>>>>> We were just considering how to implement this ourselves
> >>>>> because
> >>>>>> we
> >>>>>>>> need
> >>>>>>>>>>> to
> >>>>>>>>>>> track the history of stage changes.
> >>>>>>>>>>> The idea was to have an accumulated "seconds in state x" metric
> >>>>>> for
> >>>>>>>> every
> >>>>>>>>>>> state.
> >>>>>>>>>>> The new rebalance metric might solve part of our use case, but
> >>>>> it
> >>>>>> is
> >>>>>>>>>>> interesting what you have planned for the state metric.
> >>>>>>>>>>> best regards
> >>>>>>>>>>> Patrik
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 29 Mar 2019 at 18:56, Guozhang Wang <
> >>>>> wangguoz@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hello folks,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd like to propose the following KIP to improve the Kafka
> >>>>>> Streams
> >>>>>>>>>>> metrics
> >>>>>>>>>>>> mechanism to users. This includes 1) a minor change in the
> >>>>>> public
> >>>>>>>>>>>> StreamsMetrics API, and 2) a major cleanup on the Streams'
> >>>>> own
> >>>>>>>> built-in
> >>>>>>>>>>>> metrics hierarchy.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Details can be found here:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd love to hear your thoughts and feedbacks. Thanks!
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> -- Guozhang
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> -- Guozhang
> >>>>
> >>>
> >>>
> >>
> >>
> >
>
>

-- 
-- Guozhang

Re: [DISCUSS] KIP-444: Augment Metrics for Kafka Streams

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Would this imply that we need to update the config in each release to
add a new accepted value?


-Matthias

On 8/2/19 1:07 PM, Guozhang Wang wrote:
> Hello Matthias,
> 
> That's a good question. Thinking about a bit more, I'd like to propose that
> we just list all the version numbers since 0.10 to 2.4 as accepted values,
> and let Streams to decide if old / new set of metrics can be used
> internally (implementation wise we can reuse the const values for
> `upgrade.from` as well).
> 
> And then in the future when we remove the metrics, we can just remove the
> corresponding version values from the accepted list of this config.
> 
> 
> Guozhang
> 
> On Tue, Jul 23, 2019 at 11:55 AM Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Thanks for the KIP Guozhang.
>>
>> I just re-read the wiki page and the DISCUSS thread. Overall LGTM.
>>
>> The only nit is the naming of the new config values. With AK 2.3 being
>> released the versions numbers needs to be updated.
>>
>> Additionally, I actually think that "2.2-" and "2.3" are not the best
>> names: the `-` suffix is very subtle IMHO and actually looks more like a
>> typo, and it might be better to be more elaborate. Maybe something like
>> "up-to-2.2" ?
>>
>> For "2.3", this config value would be weird for future releases (ie,
>> 2.4, 2.5, 2.6). Hence, we might want to rename it to "newest" /
>> "current" or something like this?
>>
>> Another alternative may be to rename it to "since-2.3" (or similar) --
>> however, this may require to rename the config if we change metrics in a
>> future release (hence, it's not my preferred option).
>>
>> Thoughts?
>>
>>
>> -Matthias
>>
>> On 7/22/19 6:33 PM, Guozhang Wang wrote:
>>> Thanks everyone for your inputs, I've updated the wiki page accordingly.
>>>
>>> @Bruno: please let me know if you have any further thoughts per my
>> replies
>>> above.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Mon, Jul 22, 2019 at 6:30 PM Guozhang Wang <wa...@gmail.com>
>> wrote:
>>>
>>>> Thanks Boyang,
>>>>
>>>> I've thought about exposing time via metrics in Streams. The tricky part
>>>> though is which layer of time we should expose: right now we have
>>>> task-level and partition-level stream time (what you suggested), and
>> also
>>>> some processor internally maintain their own observed time. Today we are
>>>> still trying to get a clear and simple way of exposing a single time
>>>> concept for users to reason about their application's progress. So
>> before
>>>> we come up with a good solution I'd postpone adding it in a future KIP.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Thu, Jul 18, 2019 at 1:21 PM Boyang Chen <reluctanthero104@gmail.com
>>>
>>>> wrote:
>>>>
>>>>> I mean the partition time.
>>>>>
>>>>> On Thu, Jul 18, 2019 at 11:29 AM Guozhang Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Boyang,
>>>>>>
>>>>>> What do you mean by `per partition latency`?
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>> On Mon, Jul 1, 2019 at 9:28 AM Boyang Chen <
>> reluctanthero104@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey Guozhang,
>>>>>>>
>>>>>>> do we plan to add per partition latency in this KIP?
>>>>>>>
>>>>>>> On Mon, Jul 1, 2019 at 7:08 AM Bruno Cadonna <br...@confluent.io>
>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Guozhang,
>>>>>>>>
>>>>>>>> Thank you for the KIP.
>>>>>>>>
>>>>>>>> 1) As far as I understand, the StreamsMetrics interface is there for
>>>>>>>> user-defined processors. Would it make sense to also add a method to
>>>>>>>> the interface to specify a sensor that records skipped records?
>>>>>>>>
>>>>>>>> 2) What are the semantics of active-task-process and
>>>>>> standby-task-process
>>>>>>>>
>>>>>>>> 3) How do dropped-late-records and expired-window-record-drop relate
>>>>>>>> to each other? I guess the former is for records that fall outside
>>>>> the
>>>>>>>> grace period and the latter is for records that are processed after
>>>>>>>> the retention period of the window. Is this correct?
>>>>>>>>
>>>>>>>> 4) Is there an actual difference between skipped and dropped
>>>>> records?
>>>>>>>> If not, shall we unify the terminology?
>>>>>>>>
>>>>>>>> 5) What happens with removed metrics when the user sets the version
>>>>> of
>>>>>>>> "built.in.metrics.version" to 2.2-
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Bruno
>>>>>>>>
>>>>>>>> On Thu, Jun 27, 2019 at 6:11 PM Guozhang Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hello folks,
>>>>>>>>>
>>>>>>>>> As 2.3 is released now, I'd like to bump up this KIP discussion
>>>>> again
>>>>>>> for
>>>>>>>>> your reviews.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 23, 2019 at 4:44 PM Guozhang Wang <wangguoz@gmail.com
>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Patrik,
>>>>>>>>>>
>>>>>>>>>> Since we are rolling out 2.3 and everyone is busy with the
>>>>> release
>>>>>>> now
>>>>>>>>>> this KIP does not have much discussion involved yet and will
>>>>> slip
>>>>>>> into
>>>>>>>> the
>>>>>>>>>> next release cadence.
>>>>>>>>>>
>>>>>>>>>> This KIP itself contains several parts itself: 1. refactoring
>>>>> the
>>>>>>>> existing
>>>>>>>>>> metrics hierarchy to cleanup some redundancy and also get more
>>>>>>>> clarity; 2.
>>>>>>>>>> add instance-level metrics like rebalance and state metrics, as
>>>>>> well
>>>>>>> as
>>>>>>>>>> other static metrics.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, May 23, 2019 at 5:34 AM Patrik Kleindl <
>>>>> pkleindl@gmail.com
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Guozhang
>>>>>>>>>>> Thanks for the KIP, this looks very helpful.
>>>>>>>>>>> Could you please provide more detail on the metrics planned for
>>>>>> the
>>>>>>>> state?
>>>>>>>>>>> We were just considering how to implement this ourselves
>>>>> because
>>>>>> we
>>>>>>>> need
>>>>>>>>>>> to
>>>>>>>>>>> track the history of stage changes.
>>>>>>>>>>> The idea was to have an accumulated "seconds in state x" metric
>>>>>> for
>>>>>>>> every
>>>>>>>>>>> state.
>>>>>>>>>>> The new rebalance metric might solve part of our use case, but
>>>>> it
>>>>>> is
>>>>>>>>>>> interesting what you have planned for the state metric.
>>>>>>>>>>> best regards
>>>>>>>>>>> Patrik
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 29 Mar 2019 at 18:56, Guozhang Wang <
>>>>> wangguoz@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello folks,
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to propose the following KIP to improve the Kafka
>>>>>> Streams
>>>>>>>>>>> metrics
>>>>>>>>>>>> mechanism to users. This includes 1) a minor change in the
>>>>>> public
>>>>>>>>>>>> StreamsMetrics API, and 2) a major cleanup on the Streams'
>>>>> own
>>>>>>>> built-in
>>>>>>>>>>>> metrics hierarchy.
>>>>>>>>>>>>
>>>>>>>>>>>> Details can be found here:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams
>>>>>>>>>>>>
>>>>>>>>>>>> I'd love to hear your thoughts and feedbacks. Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> -- Guozhang
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -- Guozhang
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>
>>>
>>
>>
>