You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Viktor Somogyi-Vass <vi...@gmail.com> on 2019/06/05 10:38:01 UTC

Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

Hi Folks,

This vote sunk a bit, I'd like to draw some attention to this again in the
hope I get some feedback or votes.

Thanks,
Viktor

On Tue, May 7, 2019 at 4:28 PM Harsha <ka...@harsha.io> wrote:

> Thanks for the kip. LGTM +1.
>
> -Harsha
>
> On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
> > Hi Jason,
> >
> > I too agree this is more of a problem in older versions and therefore we
> > could backport it. Were you thinking of any specific versions? I guess
> the
> > 2.x and 1.x versions are definitely targets here but I was thinking that
> we
> > might not want to further.
> >
> > Viktor
> >
> > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
> stanislav@confluent.io>
> > wrote:
> >
> > > Thanks for the work done, Viktor! +1 (non-binding)
> > >
> > > I strongly agree with Jason that this monitoring-focused KIP is worth
> > > porting back to older versions. I am sure users will find it very
> useful
> > >
> > > Best,
> > > Stanislav
> > >
> > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Thanks, that works for me. +1
> > > >
> > > > By the way, we don't normally port KIPs to older releases, but I
> wonder
> > > if
> > > > it's worth making an exception here. From recent experience, it
> tends to
> > > be
> > > > the older versions that are more prone to fetcher failures. Thoughts?
> > > >
> > > > -Jason
> > > >
> > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
> > > > viktorsomogyi@gmail.com>
> > > > wrote:
> > > >
> > > > > Let me have a second thought, I'll just add the clientId instead to
> > > > follow
> > > > > the convention, so it'll change DeadFetcherThreadCount but with the
> > > > > clientId tag.
> > > > >
> > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass <
> > > > > viktorsomogyi@gmail.com> wrote:
> > > > >
> > > > > > Hi Jason,
> > > > > >
> > > > > > Yea I think it could make sense. In this case I would rename the
> > > > > > DeadFetcherThreadCount to DeadReplicaFetcherThreadCount and
> introduce
> > > > the
> > > > > > metric you're referring to as DeadLogDirFetcherThreadCount.
> > > > > > I'll update the KIP to reflect this.
> > > > > >
> > > > > > Viktor
> > > > > >
> > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
> jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Viktor,
> > > > > >>
> > > > > >> This looks good. Just one question I had is whether we may as
> well
> > > > cover
> > > > > >> the log dir fetchers as well.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Jason
> > > > > >>
> > > > > >>
> > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor Somogyi-Vass <
> > > > > >> viktorsomogyi@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Folks,
> > > > > >> >
> > > > > >> > This thread sunk a bit but I'd like to bump it hoping to get
> some
> > > > > >> feedback
> > > > > >> > and/or votes.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Viktor
> > > > > >> >
> > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor Somogyi-Vass <
> > > > > >> > viktorsomogyi@gmail.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Sorry, the end of the message cut off.
> > > > > >> > >
> > > > > >> > > So I tried to be consistent with the convention in
> LogManager,
> > > > hence
> > > > > >> the
> > > > > >> > > hyphens and in AbstractFetcherManager, hence the camel
> case. It
> > > > > would
> > > > > >> be
> > > > > >> > > nice though to decide with one convention across the whole
> > > > project,
> > > > > >> > however
> > > > > >> > > it requires a major refactor (especially for the components
> that
> > > > > >> leverage
> > > > > >> > > metrics for monitoring).
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Viktor
> > > > > >> > >
> > > > > >> > > On Thu, Mar 28, 2019 at 8:44 PM Viktor Somogyi-Vass <
> > > > > >> > > viktorsomogyi@gmail.com> wrote:
> > > > > >> > >
> > > > > >> > >> Hi Dhruvil,
> > > > > >> > >>
> > > > > >> > >> Thanks for the feedback and the vote. I fixed the typo in
> the
> > > > KIP.
> > > > > >> > >> The naming is interesting though. Unfortunately kafka
> overall
> > > is
> > > > > not
> > > > > >> > >> consistent in metric naming but at least I tried to be
> > > consistent
> > > > > >> among
> > > > > >> > the
> > > > > >> > >> other metrics used in LogManager
> > > > > >> > >>
> > > > > >> > >> On Thu, Mar 28, 2019 at 7:32 PM Dhruvil Shah <
> > > > dhruvil@confluent.io
> > > > > >
> > > > > >> > >> wrote:
> > > > > >> > >>
> > > > > >> > >>> Thanks for the KIP, Viktor! This is a useful addition. +1
> > > > overall.
> > > > > >> > >>>
> > > > > >> > >>> Minor nits:
> > > > > >> > >>> > I propose to add three gauge: DeadFetcherThreadCount
> for the
> > > > > >> fetcher
> > > > > >> > >>> threads, log-cleaner-dead-thread-count for the log
> cleaner.
> > > > > >> > >>> I think you meant two instead of three.
> > > > > >> > >>>
> > > > > >> > >>> Also, would it make sense to name these metrics
> consistency,
> > > > > >> something
> > > > > >> > >>> like
> > > > > >> > >>> `log-cleaner-dead-thread-count` and
> > > > > >> > `replica-fetcher-dead-thread-count`?
> > > > > >> > >>>
> > > > > >> > >>> Thanks,
> > > > > >> > >>> Dhruvil
> > > > > >> > >>>
> > > > > >> > >>> On Thu, Mar 28, 2019 at 11:27 AM Viktor Somogyi-Vass <
> > > > > >> > >>> viktorsomogyi@gmail.com> wrote:
> > > > > >> > >>>
> > > > > >> > >>> > Hi All,
> > > > > >> > >>> >
> > > > > >> > >>> > I'd like to start a vote on KIP-434.
> > > > > >> > >>> > This basically would add a metrics to count dead
> threads in
> > > > > >> > >>> > ReplicaFetcherManager and LogCleaner to allow monitoring
> > > > systems
> > > > > >> to
> > > > > >> > >>> alert
> > > > > >> > >>> > based on this.
> > > > > >> > >>> >
> > > > > >> > >>> > The KIP link:
> > > > > >> > >>> >
> > > > > >> > >>> >
> > > > > >> > >>>
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics
> > > > > >> > >>> > The
> > > > > >> > >>> > PR: https://github.com/apache/kafka/pull/6514
> > > > > >> > >>> >
> > > > > >> > >>> > I'd be happy to receive any votes or additional
> > > > feedback/reviews
> > > > > >> too.
> > > > > >> > >>> >
> > > > > >> > >>> > Thanks,
> > > > > >> > >>> > Viktor
> > > > > >> > >>> >
> > > > > >> > >>>
> > > > > >> > >>
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>

Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

Posted by Viktor Somogyi-Vass <vi...@gmail.com>.
Hi All,

It's been a few days since the last vote and we have three binding votes,
so the vote passed. Thank you all who voted and participated in the
discussions, I'd be excited to see this in the codebase!
Binding: Jason, Harsha, Colin
Non-binding: Dhruvil, Stanislav, Satish, Ryanne, Andrew, Kamal

Viktor

On Mon, Jun 10, 2019 at 4:10 PM Kamal Chandraprakash <
kamal.chandraprakash@gmail.com> wrote:

> +1 (non-binding). Thanks for the KIP!
>
> On Thu, Jun 6, 2019 at 8:12 PM Andrew Schofield <andrew_schofield@live.com
> >
> wrote:
>
> > +1 (non-binding)
> >
> > Andrew
> >
> > On 06/06/2019, 15:15, "Ryanne Dolan" <ry...@gmail.com> wrote:
> >
> >     +1 (non-binding)
> >
> >     Thanks
> >     Ryanne
> >
> >     On Wed, Jun 5, 2019, 9:31 PM Satish Duggana <
> satish.duggana@gmail.com>
> >     wrote:
> >
> >     > Thanks Viktor, proposed metrics are really useful to monitor
> > replication
> >     > status on brokers.
> >     >
> >     > +1 (non-binding)
> >     >
> >     > On Thu, Jun 6, 2019 at 2:05 AM Colin McCabe <cm...@apache.org>
> > wrote:
> >     >
> >     > > +1 (binding)
> >     > >
> >     > > best,
> >     > > Colin
> >     > >
> >     > >
> >     > > On Wed, Jun 5, 2019, at 03:38, Viktor Somogyi-Vass wrote:
> >     > > > Hi Folks,
> >     > > >
> >     > > > This vote sunk a bit, I'd like to draw some attention to this
> > again in
> >     > > the
> >     > > > hope I get some feedback or votes.
> >     > > >
> >     > > > Thanks,
> >     > > > Viktor
> >     > > >
> >     > > > On Tue, May 7, 2019 at 4:28 PM Harsha <ka...@harsha.io> wrote:
> >     > > >
> >     > > > > Thanks for the kip. LGTM +1.
> >     > > > >
> >     > > > > -Harsha
> >     > > > >
> >     > > > > On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
> >     > > > > > Hi Jason,
> >     > > > > >
> >     > > > > > I too agree this is more of a problem in older versions and
> >     > > therefore we
> >     > > > > > could backport it. Were you thinking of any specific
> > versions? I
> >     > > guess
> >     > > > > the
> >     > > > > > 2.x and 1.x versions are definitely targets here but I was
> > thinking
> >     > > that
> >     > > > > we
> >     > > > > > might not want to further.
> >     > > > > >
> >     > > > > > Viktor
> >     > > > > >
> >     > > > > > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
> >     > > > > stanislav@confluent.io>
> >     > > > > > wrote:
> >     > > > > >
> >     > > > > > > Thanks for the work done, Viktor! +1 (non-binding)
> >     > > > > > >
> >     > > > > > > I strongly agree with Jason that this monitoring-focused
> > KIP is
> >     > > worth
> >     > > > > > > porting back to older versions. I am sure users will find
> > it very
> >     > > > > useful
> >     > > > > > >
> >     > > > > > > Best,
> >     > > > > > > Stanislav
> >     > > > > > >
> >     > > > > > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <
> >     > > jason@confluent.io>
> >     > > > > > > wrote:
> >     > > > > > >
> >     > > > > > > > Thanks, that works for me. +1
> >     > > > > > > >
> >     > > > > > > > By the way, we don't normally port KIPs to older
> > releases, but
> >     > I
> >     > > > > wonder
> >     > > > > > > if
> >     > > > > > > > it's worth making an exception here. From recent
> > experience, it
> >     > > > > tends to
> >     > > > > > > be
> >     > > > > > > > the older versions that are more prone to fetcher
> > failures.
> >     > > Thoughts?
> >     > > > > > > >
> >     > > > > > > > -Jason
> >     > > > > > > >
> >     > > > > > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
> >     > > > > > > > viktorsomogyi@gmail.com>
> >     > > > > > > > wrote:
> >     > > > > > > >
> >     > > > > > > > > Let me have a second thought, I'll just add the
> > clientId
> >     > > instead to
> >     > > > > > > > follow
> >     > > > > > > > > the convention, so it'll change
> DeadFetcherThreadCount
> > but
> >     > > with the
> >     > > > > > > > > clientId tag.
> >     > > > > > > > >
> >     > > > > > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass
> <
> >     > > > > > > > > viktorsomogyi@gmail.com> wrote:
> >     > > > > > > > >
> >     > > > > > > > > > Hi Jason,
> >     > > > > > > > > >
> >     > > > > > > > > > Yea I think it could make sense. In this case I
> would
> >     > rename
> >     > > the
> >     > > > > > > > > > DeadFetcherThreadCount to
> > DeadReplicaFetcherThreadCount and
> >     > > > > introduce
> >     > > > > > > > the
> >     > > > > > > > > > metric you're referring to as
> > DeadLogDirFetcherThreadCount.
> >     > > > > > > > > > I'll update the KIP to reflect this.
> >     > > > > > > > > >
> >     > > > > > > > > > Viktor
> >     > > > > > > > > >
> >     > > > > > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
> >     > > > > jason@confluent.io>
> >     > > > > > > > > > wrote:
> >     > > > > > > > > >
> >     > > > > > > > > >> Hi Viktor,
> >     > > > > > > > > >>
> >     > > > > > > > > >> This looks good. Just one question I had is
> whether
> > we may
> >     > > as
> >     > > > > well
> >     > > > > > > > cover
> >     > > > > > > > > >> the log dir fetchers as well.
> >     > > > > > > > > >>
> >     > > > > > > > > >> Thanks,
> >     > > > > > > > > >> Jason
> >     > > > > > > > > >>
> >     > > > > > > > > >>
> >     > > > > > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor
> Somogyi-Vass
> > <
> >     > > > > > > > > >> viktorsomogyi@gmail.com>
> >     > > > > > > > > >> wrote:
> >     > > > > > > > > >>
> >     > > > > > > > > >> > Hi Folks,
> >     > > > > > > > > >> >
> >     > > > > > > > > >> > This thread sunk a bit but I'd like to bump it
> > hoping to
> >     > > get
> >     > > > > some
> >     > > > > > > > > >> feedback
> >     > > > > > > > > >> > and/or votes.
> >     > > > > > > > > >> >
> >     > > > > > > > > >> > Thanks,
> >     > > > > > > > > >> > Viktor
> >     > > > > > > > > >> >
> >     > > > > > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor
> > Somogyi-Vass <
> >     > > > > > > > > >> > viktorsomogyi@gmail.com>
> >     > > > > > > > > >> > wrote:
> >     > > > > > > > > >> >
> >     > > > > > > > > >> > > Sorry, the end of the message cut off.
> >     > > > > > > > > >> > >
> >     > > > > > > > > >> > > So I tried to be consistent with the
> convention
> > in
> >     > > > > LogManager,
> >     > > > > > > > hence
> >     > > > > > > > > >> the
> >     > > > > > > > > >> > > hyphens and in AbstractFetcherManager, hence
> > the camel
> >     > > > > case. It
> >     > > > > > > > > would
> >     > > > > > > > > >> be
> >     > > > > > > > > >> > > nice though to decide with one convention
> > across the
> >     > > whole
> >     > > > > > > > project,
> >     > > > > > > > > >> > however
> >     > > > > > > > > >> > > it requires a major refactor (especially for
> the
> >     > > components
> >     > > > > that
> >     > > > > > > > > >> leverage
> >     > > > > > > > > >> > > metrics for monitoring).
> >     > > > > > > > > >> > >
> >     > > > > > > > > >> > > Thanks,
> >     > > > > > > > > >> > > Viktor
> >     > > > > > > > > >> > >
> >     > > > > > > > > >> > > On Thu, Mar 28, 2019 at 8:44 PM Viktor
> > Somogyi-Vass <
> >     > > > > > > > > >> > > viktorsomogyi@gmail.com> wrote:
> >     > > > > > > > > >> > >
> >     > > > > > > > > >> > >> Hi Dhruvil,
> >     > > > > > > > > >> > >>
> >     > > > > > > > > >> > >> Thanks for the feedback and the vote. I fixed
> > the
> >     > typo
> >     > > in
> >     > > > > the
> >     > > > > > > > KIP.
> >     > > > > > > > > >> > >> The naming is interesting though.
> > Unfortunately kafka
> >     > > > > overall
> >     > > > > > > is
> >     > > > > > > > > not
> >     > > > > > > > > >> > >> consistent in metric naming but at least I
> > tried to
> >     > be
> >     > > > > > > consistent
> >     > > > > > > > > >> among
> >     > > > > > > > > >> > the
> >     > > > > > > > > >> > >> other metrics used in LogManager
> >     > > > > > > > > >> > >>
> >     > > > > > > > > >> > >> On Thu, Mar 28, 2019 at 7:32 PM Dhruvil Shah
> <
> >     > > > > > > > dhruvil@confluent.io
> >     > > > > > > > > >
> >     > > > > > > > > >> > >> wrote:
> >     > > > > > > > > >> > >>
> >     > > > > > > > > >> > >>> Thanks for the KIP, Viktor! This is a useful
> >     > > addition. +1
> >     > > > > > > > overall.
> >     > > > > > > > > >> > >>>
> >     > > > > > > > > >> > >>> Minor nits:
> >     > > > > > > > > >> > >>> > I propose to add three gauge:
> >     > DeadFetcherThreadCount
> >     > > > > for the
> >     > > > > > > > > >> fetcher
> >     > > > > > > > > >> > >>> threads, log-cleaner-dead-thread-count for
> > the log
> >     > > > > cleaner.
> >     > > > > > > > > >> > >>> I think you meant two instead of three.
> >     > > > > > > > > >> > >>>
> >     > > > > > > > > >> > >>> Also, would it make sense to name these
> > metrics
> >     > > > > consistency,
> >     > > > > > > > > >> something
> >     > > > > > > > > >> > >>> like
> >     > > > > > > > > >> > >>> `log-cleaner-dead-thread-count` and
> >     > > > > > > > > >> > `replica-fetcher-dead-thread-count`?
> >     > > > > > > > > >> > >>>
> >     > > > > > > > > >> > >>> Thanks,
> >     > > > > > > > > >> > >>> Dhruvil
> >     > > > > > > > > >> > >>>
> >     > > > > > > > > >> > >>> On Thu, Mar 28, 2019 at 11:27 AM Viktor
> >     > Somogyi-Vass <
> >     > > > > > > > > >> > >>> viktorsomogyi@gmail.com> wrote:
> >     > > > > > > > > >> > >>>
> >     > > > > > > > > >> > >>> > Hi All,
> >     > > > > > > > > >> > >>> >
> >     > > > > > > > > >> > >>> > I'd like to start a vote on KIP-434.
> >     > > > > > > > > >> > >>> > This basically would add a metrics to
> count
> > dead
> >     > > > > threads in
> >     > > > > > > > > >> > >>> > ReplicaFetcherManager and LogCleaner to
> > allow
> >     > > monitoring
> >     > > > > > > > systems
> >     > > > > > > > > >> to
> >     > > > > > > > > >> > >>> alert
> >     > > > > > > > > >> > >>> > based on this.
> >     > > > > > > > > >> > >>> >
> >     > > > > > > > > >> > >>> > The KIP link:
> >     > > > > > > > > >> > >>> >
> >     > > > > > > > > >> > >>> >
> >     > > > > > > > > >> > >>>
> >     > > > > > > > > >> >
> >     > > > > > > > > >>
> >     > > > > > > > >
> >     > > > > > > >
> >     > > > > > >
> >     > > > >
> >     > >
> >     >
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-434%253A%2BAdd%2BReplica%2BFetcher%2Band%2BLog%2BCleaner%2BCount%2BMetrics&amp;data=02%7C01%7C%7Cfba5c88ee8c34b6728da08d6ea896b4c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636954273269426914&amp;sdata=ao77pEeuTtwV%2FVUo6Z3k0p9FalyLaEGD%2BJdcx6aoS%2FQ%3D&amp;reserved=0
> >     > > > > > > > > >> > >>> > The
> >     > > > > > > > > >> > >>> > PR:
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkafka%2Fpull%2F6514&amp;data=02%7C01%7C%7Cfba5c88ee8c34b6728da08d6ea896b4c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636954273269426914&amp;sdata=4W7Gb4SreXQPxP1j6bt53mbYUC6cKnFJQWOv3aDMtdY%3D&amp;reserved=0
> >     > > > > > > > > >> > >>> >
> >     > > > > > > > > >> > >>> > I'd be happy to receive any votes or
> > additional
> >     > > > > > > > feedback/reviews
> >     > > > > > > > > >> too.
> >     > > > > > > > > >> > >>> >
> >     > > > > > > > > >> > >>> > Thanks,
> >     > > > > > > > > >> > >>> > Viktor
> >     > > > > > > > > >> > >>> >
> >     > > > > > > > > >> > >>>
> >     > > > > > > > > >> > >>
> >     > > > > > > > > >> >
> >     > > > > > > > > >>
> >     > > > > > > > > >
> >     > > > > > > > >
> >     > > > > > > >
> >     > > > > > >
> >     > > > > > >
> >     > > > > > > --
> >     > > > > > > Best,
> >     > > > > > > Stanislav
> >     > > > > > >
> >     > > > > >
> >     > > > >
> >     > > >
> >     > >
> >     >
> >
> >
> >
>

Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

Posted by Kamal Chandraprakash <ka...@gmail.com>.
+1 (non-binding). Thanks for the KIP!

On Thu, Jun 6, 2019 at 8:12 PM Andrew Schofield <an...@live.com>
wrote:

> +1 (non-binding)
>
> Andrew
>
> On 06/06/2019, 15:15, "Ryanne Dolan" <ry...@gmail.com> wrote:
>
>     +1 (non-binding)
>
>     Thanks
>     Ryanne
>
>     On Wed, Jun 5, 2019, 9:31 PM Satish Duggana <sa...@gmail.com>
>     wrote:
>
>     > Thanks Viktor, proposed metrics are really useful to monitor
> replication
>     > status on brokers.
>     >
>     > +1 (non-binding)
>     >
>     > On Thu, Jun 6, 2019 at 2:05 AM Colin McCabe <cm...@apache.org>
> wrote:
>     >
>     > > +1 (binding)
>     > >
>     > > best,
>     > > Colin
>     > >
>     > >
>     > > On Wed, Jun 5, 2019, at 03:38, Viktor Somogyi-Vass wrote:
>     > > > Hi Folks,
>     > > >
>     > > > This vote sunk a bit, I'd like to draw some attention to this
> again in
>     > > the
>     > > > hope I get some feedback or votes.
>     > > >
>     > > > Thanks,
>     > > > Viktor
>     > > >
>     > > > On Tue, May 7, 2019 at 4:28 PM Harsha <ka...@harsha.io> wrote:
>     > > >
>     > > > > Thanks for the kip. LGTM +1.
>     > > > >
>     > > > > -Harsha
>     > > > >
>     > > > > On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
>     > > > > > Hi Jason,
>     > > > > >
>     > > > > > I too agree this is more of a problem in older versions and
>     > > therefore we
>     > > > > > could backport it. Were you thinking of any specific
> versions? I
>     > > guess
>     > > > > the
>     > > > > > 2.x and 1.x versions are definitely targets here but I was
> thinking
>     > > that
>     > > > > we
>     > > > > > might not want to further.
>     > > > > >
>     > > > > > Viktor
>     > > > > >
>     > > > > > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
>     > > > > stanislav@confluent.io>
>     > > > > > wrote:
>     > > > > >
>     > > > > > > Thanks for the work done, Viktor! +1 (non-binding)
>     > > > > > >
>     > > > > > > I strongly agree with Jason that this monitoring-focused
> KIP is
>     > > worth
>     > > > > > > porting back to older versions. I am sure users will find
> it very
>     > > > > useful
>     > > > > > >
>     > > > > > > Best,
>     > > > > > > Stanislav
>     > > > > > >
>     > > > > > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <
>     > > jason@confluent.io>
>     > > > > > > wrote:
>     > > > > > >
>     > > > > > > > Thanks, that works for me. +1
>     > > > > > > >
>     > > > > > > > By the way, we don't normally port KIPs to older
> releases, but
>     > I
>     > > > > wonder
>     > > > > > > if
>     > > > > > > > it's worth making an exception here. From recent
> experience, it
>     > > > > tends to
>     > > > > > > be
>     > > > > > > > the older versions that are more prone to fetcher
> failures.
>     > > Thoughts?
>     > > > > > > >
>     > > > > > > > -Jason
>     > > > > > > >
>     > > > > > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
>     > > > > > > > viktorsomogyi@gmail.com>
>     > > > > > > > wrote:
>     > > > > > > >
>     > > > > > > > > Let me have a second thought, I'll just add the
> clientId
>     > > instead to
>     > > > > > > > follow
>     > > > > > > > > the convention, so it'll change DeadFetcherThreadCount
> but
>     > > with the
>     > > > > > > > > clientId tag.
>     > > > > > > > >
>     > > > > > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass <
>     > > > > > > > > viktorsomogyi@gmail.com> wrote:
>     > > > > > > > >
>     > > > > > > > > > Hi Jason,
>     > > > > > > > > >
>     > > > > > > > > > Yea I think it could make sense. In this case I would
>     > rename
>     > > the
>     > > > > > > > > > DeadFetcherThreadCount to
> DeadReplicaFetcherThreadCount and
>     > > > > introduce
>     > > > > > > > the
>     > > > > > > > > > metric you're referring to as
> DeadLogDirFetcherThreadCount.
>     > > > > > > > > > I'll update the KIP to reflect this.
>     > > > > > > > > >
>     > > > > > > > > > Viktor
>     > > > > > > > > >
>     > > > > > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
>     > > > > jason@confluent.io>
>     > > > > > > > > > wrote:
>     > > > > > > > > >
>     > > > > > > > > >> Hi Viktor,
>     > > > > > > > > >>
>     > > > > > > > > >> This looks good. Just one question I had is whether
> we may
>     > > as
>     > > > > well
>     > > > > > > > cover
>     > > > > > > > > >> the log dir fetchers as well.
>     > > > > > > > > >>
>     > > > > > > > > >> Thanks,
>     > > > > > > > > >> Jason
>     > > > > > > > > >>
>     > > > > > > > > >>
>     > > > > > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor Somogyi-Vass
> <
>     > > > > > > > > >> viktorsomogyi@gmail.com>
>     > > > > > > > > >> wrote:
>     > > > > > > > > >>
>     > > > > > > > > >> > Hi Folks,
>     > > > > > > > > >> >
>     > > > > > > > > >> > This thread sunk a bit but I'd like to bump it
> hoping to
>     > > get
>     > > > > some
>     > > > > > > > > >> feedback
>     > > > > > > > > >> > and/or votes.
>     > > > > > > > > >> >
>     > > > > > > > > >> > Thanks,
>     > > > > > > > > >> > Viktor
>     > > > > > > > > >> >
>     > > > > > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor
> Somogyi-Vass <
>     > > > > > > > > >> > viktorsomogyi@gmail.com>
>     > > > > > > > > >> > wrote:
>     > > > > > > > > >> >
>     > > > > > > > > >> > > Sorry, the end of the message cut off.
>     > > > > > > > > >> > >
>     > > > > > > > > >> > > So I tried to be consistent with the convention
> in
>     > > > > LogManager,
>     > > > > > > > hence
>     > > > > > > > > >> the
>     > > > > > > > > >> > > hyphens and in AbstractFetcherManager, hence
> the camel
>     > > > > case. It
>     > > > > > > > > would
>     > > > > > > > > >> be
>     > > > > > > > > >> > > nice though to decide with one convention
> across the
>     > > whole
>     > > > > > > > project,
>     > > > > > > > > >> > however
>     > > > > > > > > >> > > it requires a major refactor (especially for the
>     > > components
>     > > > > that
>     > > > > > > > > >> leverage
>     > > > > > > > > >> > > metrics for monitoring).
>     > > > > > > > > >> > >
>     > > > > > > > > >> > > Thanks,
>     > > > > > > > > >> > > Viktor
>     > > > > > > > > >> > >
>     > > > > > > > > >> > > On Thu, Mar 28, 2019 at 8:44 PM Viktor
> Somogyi-Vass <
>     > > > > > > > > >> > > viktorsomogyi@gmail.com> wrote:
>     > > > > > > > > >> > >
>     > > > > > > > > >> > >> Hi Dhruvil,
>     > > > > > > > > >> > >>
>     > > > > > > > > >> > >> Thanks for the feedback and the vote. I fixed
> the
>     > typo
>     > > in
>     > > > > the
>     > > > > > > > KIP.
>     > > > > > > > > >> > >> The naming is interesting though.
> Unfortunately kafka
>     > > > > overall
>     > > > > > > is
>     > > > > > > > > not
>     > > > > > > > > >> > >> consistent in metric naming but at least I
> tried to
>     > be
>     > > > > > > consistent
>     > > > > > > > > >> among
>     > > > > > > > > >> > the
>     > > > > > > > > >> > >> other metrics used in LogManager
>     > > > > > > > > >> > >>
>     > > > > > > > > >> > >> On Thu, Mar 28, 2019 at 7:32 PM Dhruvil Shah <
>     > > > > > > > dhruvil@confluent.io
>     > > > > > > > > >
>     > > > > > > > > >> > >> wrote:
>     > > > > > > > > >> > >>
>     > > > > > > > > >> > >>> Thanks for the KIP, Viktor! This is a useful
>     > > addition. +1
>     > > > > > > > overall.
>     > > > > > > > > >> > >>>
>     > > > > > > > > >> > >>> Minor nits:
>     > > > > > > > > >> > >>> > I propose to add three gauge:
>     > DeadFetcherThreadCount
>     > > > > for the
>     > > > > > > > > >> fetcher
>     > > > > > > > > >> > >>> threads, log-cleaner-dead-thread-count for
> the log
>     > > > > cleaner.
>     > > > > > > > > >> > >>> I think you meant two instead of three.
>     > > > > > > > > >> > >>>
>     > > > > > > > > >> > >>> Also, would it make sense to name these
> metrics
>     > > > > consistency,
>     > > > > > > > > >> something
>     > > > > > > > > >> > >>> like
>     > > > > > > > > >> > >>> `log-cleaner-dead-thread-count` and
>     > > > > > > > > >> > `replica-fetcher-dead-thread-count`?
>     > > > > > > > > >> > >>>
>     > > > > > > > > >> > >>> Thanks,
>     > > > > > > > > >> > >>> Dhruvil
>     > > > > > > > > >> > >>>
>     > > > > > > > > >> > >>> On Thu, Mar 28, 2019 at 11:27 AM Viktor
>     > Somogyi-Vass <
>     > > > > > > > > >> > >>> viktorsomogyi@gmail.com> wrote:
>     > > > > > > > > >> > >>>
>     > > > > > > > > >> > >>> > Hi All,
>     > > > > > > > > >> > >>> >
>     > > > > > > > > >> > >>> > I'd like to start a vote on KIP-434.
>     > > > > > > > > >> > >>> > This basically would add a metrics to count
> dead
>     > > > > threads in
>     > > > > > > > > >> > >>> > ReplicaFetcherManager and LogCleaner to
> allow
>     > > monitoring
>     > > > > > > > systems
>     > > > > > > > > >> to
>     > > > > > > > > >> > >>> alert
>     > > > > > > > > >> > >>> > based on this.
>     > > > > > > > > >> > >>> >
>     > > > > > > > > >> > >>> > The KIP link:
>     > > > > > > > > >> > >>> >
>     > > > > > > > > >> > >>> >
>     > > > > > > > > >> > >>>
>     > > > > > > > > >> >
>     > > > > > > > > >>
>     > > > > > > > >
>     > > > > > > >
>     > > > > > >
>     > > > >
>     > >
>     >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-434%253A%2BAdd%2BReplica%2BFetcher%2Band%2BLog%2BCleaner%2BCount%2BMetrics&amp;data=02%7C01%7C%7Cfba5c88ee8c34b6728da08d6ea896b4c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636954273269426914&amp;sdata=ao77pEeuTtwV%2FVUo6Z3k0p9FalyLaEGD%2BJdcx6aoS%2FQ%3D&amp;reserved=0
>     > > > > > > > > >> > >>> > The
>     > > > > > > > > >> > >>> > PR:
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkafka%2Fpull%2F6514&amp;data=02%7C01%7C%7Cfba5c88ee8c34b6728da08d6ea896b4c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636954273269426914&amp;sdata=4W7Gb4SreXQPxP1j6bt53mbYUC6cKnFJQWOv3aDMtdY%3D&amp;reserved=0
>     > > > > > > > > >> > >>> >
>     > > > > > > > > >> > >>> > I'd be happy to receive any votes or
> additional
>     > > > > > > > feedback/reviews
>     > > > > > > > > >> too.
>     > > > > > > > > >> > >>> >
>     > > > > > > > > >> > >>> > Thanks,
>     > > > > > > > > >> > >>> > Viktor
>     > > > > > > > > >> > >>> >
>     > > > > > > > > >> > >>>
>     > > > > > > > > >> > >>
>     > > > > > > > > >> >
>     > > > > > > > > >>
>     > > > > > > > > >
>     > > > > > > > >
>     > > > > > > >
>     > > > > > >
>     > > > > > >
>     > > > > > > --
>     > > > > > > Best,
>     > > > > > > Stanislav
>     > > > > > >
>     > > > > >
>     > > > >
>     > > >
>     > >
>     >
>
>
>

Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

Posted by Andrew Schofield <an...@live.com>.
+1 (non-binding)

Andrew

On 06/06/2019, 15:15, "Ryanne Dolan" <ry...@gmail.com> wrote:

    +1 (non-binding)
    
    Thanks
    Ryanne
    
    On Wed, Jun 5, 2019, 9:31 PM Satish Duggana <sa...@gmail.com>
    wrote:
    
    > Thanks Viktor, proposed metrics are really useful to monitor replication
    > status on brokers.
    >
    > +1 (non-binding)
    >
    > On Thu, Jun 6, 2019 at 2:05 AM Colin McCabe <cm...@apache.org> wrote:
    >
    > > +1 (binding)
    > >
    > > best,
    > > Colin
    > >
    > >
    > > On Wed, Jun 5, 2019, at 03:38, Viktor Somogyi-Vass wrote:
    > > > Hi Folks,
    > > >
    > > > This vote sunk a bit, I'd like to draw some attention to this again in
    > > the
    > > > hope I get some feedback or votes.
    > > >
    > > > Thanks,
    > > > Viktor
    > > >
    > > > On Tue, May 7, 2019 at 4:28 PM Harsha <ka...@harsha.io> wrote:
    > > >
    > > > > Thanks for the kip. LGTM +1.
    > > > >
    > > > > -Harsha
    > > > >
    > > > > On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
    > > > > > Hi Jason,
    > > > > >
    > > > > > I too agree this is more of a problem in older versions and
    > > therefore we
    > > > > > could backport it. Were you thinking of any specific versions? I
    > > guess
    > > > > the
    > > > > > 2.x and 1.x versions are definitely targets here but I was thinking
    > > that
    > > > > we
    > > > > > might not want to further.
    > > > > >
    > > > > > Viktor
    > > > > >
    > > > > > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
    > > > > stanislav@confluent.io>
    > > > > > wrote:
    > > > > >
    > > > > > > Thanks for the work done, Viktor! +1 (non-binding)
    > > > > > >
    > > > > > > I strongly agree with Jason that this monitoring-focused KIP is
    > > worth
    > > > > > > porting back to older versions. I am sure users will find it very
    > > > > useful
    > > > > > >
    > > > > > > Best,
    > > > > > > Stanislav
    > > > > > >
    > > > > > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <
    > > jason@confluent.io>
    > > > > > > wrote:
    > > > > > >
    > > > > > > > Thanks, that works for me. +1
    > > > > > > >
    > > > > > > > By the way, we don't normally port KIPs to older releases, but
    > I
    > > > > wonder
    > > > > > > if
    > > > > > > > it's worth making an exception here. From recent experience, it
    > > > > tends to
    > > > > > > be
    > > > > > > > the older versions that are more prone to fetcher failures.
    > > Thoughts?
    > > > > > > >
    > > > > > > > -Jason
    > > > > > > >
    > > > > > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
    > > > > > > > viktorsomogyi@gmail.com>
    > > > > > > > wrote:
    > > > > > > >
    > > > > > > > > Let me have a second thought, I'll just add the clientId
    > > instead to
    > > > > > > > follow
    > > > > > > > > the convention, so it'll change DeadFetcherThreadCount but
    > > with the
    > > > > > > > > clientId tag.
    > > > > > > > >
    > > > > > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass <
    > > > > > > > > viktorsomogyi@gmail.com> wrote:
    > > > > > > > >
    > > > > > > > > > Hi Jason,
    > > > > > > > > >
    > > > > > > > > > Yea I think it could make sense. In this case I would
    > rename
    > > the
    > > > > > > > > > DeadFetcherThreadCount to DeadReplicaFetcherThreadCount and
    > > > > introduce
    > > > > > > > the
    > > > > > > > > > metric you're referring to as DeadLogDirFetcherThreadCount.
    > > > > > > > > > I'll update the KIP to reflect this.
    > > > > > > > > >
    > > > > > > > > > Viktor
    > > > > > > > > >
    > > > > > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
    > > > > jason@confluent.io>
    > > > > > > > > > wrote:
    > > > > > > > > >
    > > > > > > > > >> Hi Viktor,
    > > > > > > > > >>
    > > > > > > > > >> This looks good. Just one question I had is whether we may
    > > as
    > > > > well
    > > > > > > > cover
    > > > > > > > > >> the log dir fetchers as well.
    > > > > > > > > >>
    > > > > > > > > >> Thanks,
    > > > > > > > > >> Jason
    > > > > > > > > >>
    > > > > > > > > >>
    > > > > > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor Somogyi-Vass <
    > > > > > > > > >> viktorsomogyi@gmail.com>
    > > > > > > > > >> wrote:
    > > > > > > > > >>
    > > > > > > > > >> > Hi Folks,
    > > > > > > > > >> >
    > > > > > > > > >> > This thread sunk a bit but I'd like to bump it hoping to
    > > get
    > > > > some
    > > > > > > > > >> feedback
    > > > > > > > > >> > and/or votes.
    > > > > > > > > >> >
    > > > > > > > > >> > Thanks,
    > > > > > > > > >> > Viktor
    > > > > > > > > >> >
    > > > > > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor Somogyi-Vass <
    > > > > > > > > >> > viktorsomogyi@gmail.com>
    > > > > > > > > >> > wrote:
    > > > > > > > > >> >
    > > > > > > > > >> > > Sorry, the end of the message cut off.
    > > > > > > > > >> > >
    > > > > > > > > >> > > So I tried to be consistent with the convention in
    > > > > LogManager,
    > > > > > > > hence
    > > > > > > > > >> the
    > > > > > > > > >> > > hyphens and in AbstractFetcherManager, hence the camel
    > > > > case. It
    > > > > > > > > would
    > > > > > > > > >> be
    > > > > > > > > >> > > nice though to decide with one convention across the
    > > whole
    > > > > > > > project,
    > > > > > > > > >> > however
    > > > > > > > > >> > > it requires a major refactor (especially for the
    > > components
    > > > > that
    > > > > > > > > >> leverage
    > > > > > > > > >> > > metrics for monitoring).
    > > > > > > > > >> > >
    > > > > > > > > >> > > Thanks,
    > > > > > > > > >> > > Viktor
    > > > > > > > > >> > >
    > > > > > > > > >> > > On Thu, Mar 28, 2019 at 8:44 PM Viktor Somogyi-Vass <
    > > > > > > > > >> > > viktorsomogyi@gmail.com> wrote:
    > > > > > > > > >> > >
    > > > > > > > > >> > >> Hi Dhruvil,
    > > > > > > > > >> > >>
    > > > > > > > > >> > >> Thanks for the feedback and the vote. I fixed the
    > typo
    > > in
    > > > > the
    > > > > > > > KIP.
    > > > > > > > > >> > >> The naming is interesting though. Unfortunately kafka
    > > > > overall
    > > > > > > is
    > > > > > > > > not
    > > > > > > > > >> > >> consistent in metric naming but at least I tried to
    > be
    > > > > > > consistent
    > > > > > > > > >> among
    > > > > > > > > >> > the
    > > > > > > > > >> > >> other metrics used in LogManager
    > > > > > > > > >> > >>
    > > > > > > > > >> > >> On Thu, Mar 28, 2019 at 7:32 PM Dhruvil Shah <
    > > > > > > > dhruvil@confluent.io
    > > > > > > > > >
    > > > > > > > > >> > >> wrote:
    > > > > > > > > >> > >>
    > > > > > > > > >> > >>> Thanks for the KIP, Viktor! This is a useful
    > > addition. +1
    > > > > > > > overall.
    > > > > > > > > >> > >>>
    > > > > > > > > >> > >>> Minor nits:
    > > > > > > > > >> > >>> > I propose to add three gauge:
    > DeadFetcherThreadCount
    > > > > for the
    > > > > > > > > >> fetcher
    > > > > > > > > >> > >>> threads, log-cleaner-dead-thread-count for the log
    > > > > cleaner.
    > > > > > > > > >> > >>> I think you meant two instead of three.
    > > > > > > > > >> > >>>
    > > > > > > > > >> > >>> Also, would it make sense to name these metrics
    > > > > consistency,
    > > > > > > > > >> something
    > > > > > > > > >> > >>> like
    > > > > > > > > >> > >>> `log-cleaner-dead-thread-count` and
    > > > > > > > > >> > `replica-fetcher-dead-thread-count`?
    > > > > > > > > >> > >>>
    > > > > > > > > >> > >>> Thanks,
    > > > > > > > > >> > >>> Dhruvil
    > > > > > > > > >> > >>>
    > > > > > > > > >> > >>> On Thu, Mar 28, 2019 at 11:27 AM Viktor
    > Somogyi-Vass <
    > > > > > > > > >> > >>> viktorsomogyi@gmail.com> wrote:
    > > > > > > > > >> > >>>
    > > > > > > > > >> > >>> > Hi All,
    > > > > > > > > >> > >>> >
    > > > > > > > > >> > >>> > I'd like to start a vote on KIP-434.
    > > > > > > > > >> > >>> > This basically would add a metrics to count dead
    > > > > threads in
    > > > > > > > > >> > >>> > ReplicaFetcherManager and LogCleaner to allow
    > > monitoring
    > > > > > > > systems
    > > > > > > > > >> to
    > > > > > > > > >> > >>> alert
    > > > > > > > > >> > >>> > based on this.
    > > > > > > > > >> > >>> >
    > > > > > > > > >> > >>> > The KIP link:
    > > > > > > > > >> > >>> >
    > > > > > > > > >> > >>> >
    > > > > > > > > >> > >>>
    > > > > > > > > >> >
    > > > > > > > > >>
    > > > > > > > >
    > > > > > > >
    > > > > > >
    > > > >
    > >
    > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-434%253A%2BAdd%2BReplica%2BFetcher%2Band%2BLog%2BCleaner%2BCount%2BMetrics&amp;data=02%7C01%7C%7Cfba5c88ee8c34b6728da08d6ea896b4c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636954273269426914&amp;sdata=ao77pEeuTtwV%2FVUo6Z3k0p9FalyLaEGD%2BJdcx6aoS%2FQ%3D&amp;reserved=0
    > > > > > > > > >> > >>> > The
    > > > > > > > > >> > >>> > PR: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkafka%2Fpull%2F6514&amp;data=02%7C01%7C%7Cfba5c88ee8c34b6728da08d6ea896b4c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636954273269426914&amp;sdata=4W7Gb4SreXQPxP1j6bt53mbYUC6cKnFJQWOv3aDMtdY%3D&amp;reserved=0
    > > > > > > > > >> > >>> >
    > > > > > > > > >> > >>> > I'd be happy to receive any votes or additional
    > > > > > > > feedback/reviews
    > > > > > > > > >> too.
    > > > > > > > > >> > >>> >
    > > > > > > > > >> > >>> > Thanks,
    > > > > > > > > >> > >>> > Viktor
    > > > > > > > > >> > >>> >
    > > > > > > > > >> > >>>
    > > > > > > > > >> > >>
    > > > > > > > > >> >
    > > > > > > > > >>
    > > > > > > > > >
    > > > > > > > >
    > > > > > > >
    > > > > > >
    > > > > > >
    > > > > > > --
    > > > > > > Best,
    > > > > > > Stanislav
    > > > > > >
    > > > > >
    > > > >
    > > >
    > >
    >
    


Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

Posted by Ryanne Dolan <ry...@gmail.com>.
+1 (non-binding)

Thanks
Ryanne

On Wed, Jun 5, 2019, 9:31 PM Satish Duggana <sa...@gmail.com>
wrote:

> Thanks Viktor, proposed metrics are really useful to monitor replication
> status on brokers.
>
> +1 (non-binding)
>
> On Thu, Jun 6, 2019 at 2:05 AM Colin McCabe <cm...@apache.org> wrote:
>
> > +1 (binding)
> >
> > best,
> > Colin
> >
> >
> > On Wed, Jun 5, 2019, at 03:38, Viktor Somogyi-Vass wrote:
> > > Hi Folks,
> > >
> > > This vote sunk a bit, I'd like to draw some attention to this again in
> > the
> > > hope I get some feedback or votes.
> > >
> > > Thanks,
> > > Viktor
> > >
> > > On Tue, May 7, 2019 at 4:28 PM Harsha <ka...@harsha.io> wrote:
> > >
> > > > Thanks for the kip. LGTM +1.
> > > >
> > > > -Harsha
> > > >
> > > > On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
> > > > > Hi Jason,
> > > > >
> > > > > I too agree this is more of a problem in older versions and
> > therefore we
> > > > > could backport it. Were you thinking of any specific versions? I
> > guess
> > > > the
> > > > > 2.x and 1.x versions are definitely targets here but I was thinking
> > that
> > > > we
> > > > > might not want to further.
> > > > >
> > > > > Viktor
> > > > >
> > > > > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the work done, Viktor! +1 (non-binding)
> > > > > >
> > > > > > I strongly agree with Jason that this monitoring-focused KIP is
> > worth
> > > > > > porting back to older versions. I am sure users will find it very
> > > > useful
> > > > > >
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <
> > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks, that works for me. +1
> > > > > > >
> > > > > > > By the way, we don't normally port KIPs to older releases, but
> I
> > > > wonder
> > > > > > if
> > > > > > > it's worth making an exception here. From recent experience, it
> > > > tends to
> > > > > > be
> > > > > > > the older versions that are more prone to fetcher failures.
> > Thoughts?
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
> > > > > > > viktorsomogyi@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Let me have a second thought, I'll just add the clientId
> > instead to
> > > > > > > follow
> > > > > > > > the convention, so it'll change DeadFetcherThreadCount but
> > with the
> > > > > > > > clientId tag.
> > > > > > > >
> > > > > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass <
> > > > > > > > viktorsomogyi@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi Jason,
> > > > > > > > >
> > > > > > > > > Yea I think it could make sense. In this case I would
> rename
> > the
> > > > > > > > > DeadFetcherThreadCount to DeadReplicaFetcherThreadCount and
> > > > introduce
> > > > > > > the
> > > > > > > > > metric you're referring to as DeadLogDirFetcherThreadCount.
> > > > > > > > > I'll update the KIP to reflect this.
> > > > > > > > >
> > > > > > > > > Viktor
> > > > > > > > >
> > > > > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
> > > > jason@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi Viktor,
> > > > > > > > >>
> > > > > > > > >> This looks good. Just one question I had is whether we may
> > as
> > > > well
> > > > > > > cover
> > > > > > > > >> the log dir fetchers as well.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Jason
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor Somogyi-Vass <
> > > > > > > > >> viktorsomogyi@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Folks,
> > > > > > > > >> >
> > > > > > > > >> > This thread sunk a bit but I'd like to bump it hoping to
> > get
> > > > some
> > > > > > > > >> feedback
> > > > > > > > >> > and/or votes.
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> > Viktor
> > > > > > > > >> >
> > > > > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor Somogyi-Vass <
> > > > > > > > >> > viktorsomogyi@gmail.com>
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Sorry, the end of the message cut off.
> > > > > > > > >> > >
> > > > > > > > >> > > So I tried to be consistent with the convention in
> > > > LogManager,
> > > > > > > hence
> > > > > > > > >> the
> > > > > > > > >> > > hyphens and in AbstractFetcherManager, hence the camel
> > > > case. It
> > > > > > > > would
> > > > > > > > >> be
> > > > > > > > >> > > nice though to decide with one convention across the
> > whole
> > > > > > > project,
> > > > > > > > >> > however
> > > > > > > > >> > > it requires a major refactor (especially for the
> > components
> > > > that
> > > > > > > > >> leverage
> > > > > > > > >> > > metrics for monitoring).
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks,
> > > > > > > > >> > > Viktor
> > > > > > > > >> > >
> > > > > > > > >> > > On Thu, Mar 28, 2019 at 8:44 PM Viktor Somogyi-Vass <
> > > > > > > > >> > > viktorsomogyi@gmail.com> wrote:
> > > > > > > > >> > >
> > > > > > > > >> > >> Hi Dhruvil,
> > > > > > > > >> > >>
> > > > > > > > >> > >> Thanks for the feedback and the vote. I fixed the
> typo
> > in
> > > > the
> > > > > > > KIP.
> > > > > > > > >> > >> The naming is interesting though. Unfortunately kafka
> > > > overall
> > > > > > is
> > > > > > > > not
> > > > > > > > >> > >> consistent in metric naming but at least I tried to
> be
> > > > > > consistent
> > > > > > > > >> among
> > > > > > > > >> > the
> > > > > > > > >> > >> other metrics used in LogManager
> > > > > > > > >> > >>
> > > > > > > > >> > >> On Thu, Mar 28, 2019 at 7:32 PM Dhruvil Shah <
> > > > > > > dhruvil@confluent.io
> > > > > > > > >
> > > > > > > > >> > >> wrote:
> > > > > > > > >> > >>
> > > > > > > > >> > >>> Thanks for the KIP, Viktor! This is a useful
> > addition. +1
> > > > > > > overall.
> > > > > > > > >> > >>>
> > > > > > > > >> > >>> Minor nits:
> > > > > > > > >> > >>> > I propose to add three gauge:
> DeadFetcherThreadCount
> > > > for the
> > > > > > > > >> fetcher
> > > > > > > > >> > >>> threads, log-cleaner-dead-thread-count for the log
> > > > cleaner.
> > > > > > > > >> > >>> I think you meant two instead of three.
> > > > > > > > >> > >>>
> > > > > > > > >> > >>> Also, would it make sense to name these metrics
> > > > consistency,
> > > > > > > > >> something
> > > > > > > > >> > >>> like
> > > > > > > > >> > >>> `log-cleaner-dead-thread-count` and
> > > > > > > > >> > `replica-fetcher-dead-thread-count`?
> > > > > > > > >> > >>>
> > > > > > > > >> > >>> Thanks,
> > > > > > > > >> > >>> Dhruvil
> > > > > > > > >> > >>>
> > > > > > > > >> > >>> On Thu, Mar 28, 2019 at 11:27 AM Viktor
> Somogyi-Vass <
> > > > > > > > >> > >>> viktorsomogyi@gmail.com> wrote:
> > > > > > > > >> > >>>
> > > > > > > > >> > >>> > Hi All,
> > > > > > > > >> > >>> >
> > > > > > > > >> > >>> > I'd like to start a vote on KIP-434.
> > > > > > > > >> > >>> > This basically would add a metrics to count dead
> > > > threads in
> > > > > > > > >> > >>> > ReplicaFetcherManager and LogCleaner to allow
> > monitoring
> > > > > > > systems
> > > > > > > > >> to
> > > > > > > > >> > >>> alert
> > > > > > > > >> > >>> > based on this.
> > > > > > > > >> > >>> >
> > > > > > > > >> > >>> > The KIP link:
> > > > > > > > >> > >>> >
> > > > > > > > >> > >>> >
> > > > > > > > >> > >>>
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics
> > > > > > > > >> > >>> > The
> > > > > > > > >> > >>> > PR: https://github.com/apache/kafka/pull/6514
> > > > > > > > >> > >>> >
> > > > > > > > >> > >>> > I'd be happy to receive any votes or additional
> > > > > > > feedback/reviews
> > > > > > > > >> too.
> > > > > > > > >> > >>> >
> > > > > > > > >> > >>> > Thanks,
> > > > > > > > >> > >>> > Viktor
> > > > > > > > >> > >>> >
> > > > > > > > >> > >>>
> > > > > > > > >> > >>
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

Posted by Satish Duggana <sa...@gmail.com>.
Thanks Viktor, proposed metrics are really useful to monitor replication
status on brokers.

+1 (non-binding)

On Thu, Jun 6, 2019 at 2:05 AM Colin McCabe <cm...@apache.org> wrote:

> +1 (binding)
>
> best,
> Colin
>
>
> On Wed, Jun 5, 2019, at 03:38, Viktor Somogyi-Vass wrote:
> > Hi Folks,
> >
> > This vote sunk a bit, I'd like to draw some attention to this again in
> the
> > hope I get some feedback or votes.
> >
> > Thanks,
> > Viktor
> >
> > On Tue, May 7, 2019 at 4:28 PM Harsha <ka...@harsha.io> wrote:
> >
> > > Thanks for the kip. LGTM +1.
> > >
> > > -Harsha
> > >
> > > On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
> > > > Hi Jason,
> > > >
> > > > I too agree this is more of a problem in older versions and
> therefore we
> > > > could backport it. Were you thinking of any specific versions? I
> guess
> > > the
> > > > 2.x and 1.x versions are definitely targets here but I was thinking
> that
> > > we
> > > > might not want to further.
> > > >
> > > > Viktor
> > > >
> > > > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Thanks for the work done, Viktor! +1 (non-binding)
> > > > >
> > > > > I strongly agree with Jason that this monitoring-focused KIP is
> worth
> > > > > porting back to older versions. I am sure users will find it very
> > > useful
> > > > >
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Thanks, that works for me. +1
> > > > > >
> > > > > > By the way, we don't normally port KIPs to older releases, but I
> > > wonder
> > > > > if
> > > > > > it's worth making an exception here. From recent experience, it
> > > tends to
> > > > > be
> > > > > > the older versions that are more prone to fetcher failures.
> Thoughts?
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
> > > > > > viktorsomogyi@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Let me have a second thought, I'll just add the clientId
> instead to
> > > > > > follow
> > > > > > > the convention, so it'll change DeadFetcherThreadCount but
> with the
> > > > > > > clientId tag.
> > > > > > >
> > > > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass <
> > > > > > > viktorsomogyi@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi Jason,
> > > > > > > >
> > > > > > > > Yea I think it could make sense. In this case I would rename
> the
> > > > > > > > DeadFetcherThreadCount to DeadReplicaFetcherThreadCount and
> > > introduce
> > > > > > the
> > > > > > > > metric you're referring to as DeadLogDirFetcherThreadCount.
> > > > > > > > I'll update the KIP to reflect this.
> > > > > > > >
> > > > > > > > Viktor
> > > > > > > >
> > > > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
> > > jason@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Viktor,
> > > > > > > >>
> > > > > > > >> This looks good. Just one question I had is whether we may
> as
> > > well
> > > > > > cover
> > > > > > > >> the log dir fetchers as well.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Jason
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor Somogyi-Vass <
> > > > > > > >> viktorsomogyi@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Folks,
> > > > > > > >> >
> > > > > > > >> > This thread sunk a bit but I'd like to bump it hoping to
> get
> > > some
> > > > > > > >> feedback
> > > > > > > >> > and/or votes.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > Viktor
> > > > > > > >> >
> > > > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor Somogyi-Vass <
> > > > > > > >> > viktorsomogyi@gmail.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Sorry, the end of the message cut off.
> > > > > > > >> > >
> > > > > > > >> > > So I tried to be consistent with the convention in
> > > LogManager,
> > > > > > hence
> > > > > > > >> the
> > > > > > > >> > > hyphens and in AbstractFetcherManager, hence the camel
> > > case. It
> > > > > > > would
> > > > > > > >> be
> > > > > > > >> > > nice though to decide with one convention across the
> whole
> > > > > > project,
> > > > > > > >> > however
> > > > > > > >> > > it requires a major refactor (especially for the
> components
> > > that
> > > > > > > >> leverage
> > > > > > > >> > > metrics for monitoring).
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Viktor
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Mar 28, 2019 at 8:44 PM Viktor Somogyi-Vass <
> > > > > > > >> > > viktorsomogyi@gmail.com> wrote:
> > > > > > > >> > >
> > > > > > > >> > >> Hi Dhruvil,
> > > > > > > >> > >>
> > > > > > > >> > >> Thanks for the feedback and the vote. I fixed the typo
> in
> > > the
> > > > > > KIP.
> > > > > > > >> > >> The naming is interesting though. Unfortunately kafka
> > > overall
> > > > > is
> > > > > > > not
> > > > > > > >> > >> consistent in metric naming but at least I tried to be
> > > > > consistent
> > > > > > > >> among
> > > > > > > >> > the
> > > > > > > >> > >> other metrics used in LogManager
> > > > > > > >> > >>
> > > > > > > >> > >> On Thu, Mar 28, 2019 at 7:32 PM Dhruvil Shah <
> > > > > > dhruvil@confluent.io
> > > > > > > >
> > > > > > > >> > >> wrote:
> > > > > > > >> > >>
> > > > > > > >> > >>> Thanks for the KIP, Viktor! This is a useful
> addition. +1
> > > > > > overall.
> > > > > > > >> > >>>
> > > > > > > >> > >>> Minor nits:
> > > > > > > >> > >>> > I propose to add three gauge: DeadFetcherThreadCount
> > > for the
> > > > > > > >> fetcher
> > > > > > > >> > >>> threads, log-cleaner-dead-thread-count for the log
> > > cleaner.
> > > > > > > >> > >>> I think you meant two instead of three.
> > > > > > > >> > >>>
> > > > > > > >> > >>> Also, would it make sense to name these metrics
> > > consistency,
> > > > > > > >> something
> > > > > > > >> > >>> like
> > > > > > > >> > >>> `log-cleaner-dead-thread-count` and
> > > > > > > >> > `replica-fetcher-dead-thread-count`?
> > > > > > > >> > >>>
> > > > > > > >> > >>> Thanks,
> > > > > > > >> > >>> Dhruvil
> > > > > > > >> > >>>
> > > > > > > >> > >>> On Thu, Mar 28, 2019 at 11:27 AM Viktor Somogyi-Vass <
> > > > > > > >> > >>> viktorsomogyi@gmail.com> wrote:
> > > > > > > >> > >>>
> > > > > > > >> > >>> > Hi All,
> > > > > > > >> > >>> >
> > > > > > > >> > >>> > I'd like to start a vote on KIP-434.
> > > > > > > >> > >>> > This basically would add a metrics to count dead
> > > threads in
> > > > > > > >> > >>> > ReplicaFetcherManager and LogCleaner to allow
> monitoring
> > > > > > systems
> > > > > > > >> to
> > > > > > > >> > >>> alert
> > > > > > > >> > >>> > based on this.
> > > > > > > >> > >>> >
> > > > > > > >> > >>> > The KIP link:
> > > > > > > >> > >>> >
> > > > > > > >> > >>> >
> > > > > > > >> > >>>
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics
> > > > > > > >> > >>> > The
> > > > > > > >> > >>> > PR: https://github.com/apache/kafka/pull/6514
> > > > > > > >> > >>> >
> > > > > > > >> > >>> > I'd be happy to receive any votes or additional
> > > > > > feedback/reviews
> > > > > > > >> too.
> > > > > > > >> > >>> >
> > > > > > > >> > >>> > Thanks,
> > > > > > > >> > >>> > Viktor
> > > > > > > >> > >>> >
> > > > > > > >> > >>>
> > > > > > > >> > >>
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-434: Dead replica fetcher and log cleaner metrics

Posted by Colin McCabe <cm...@apache.org>.
+1 (binding)

best,
Colin


On Wed, Jun 5, 2019, at 03:38, Viktor Somogyi-Vass wrote:
> Hi Folks,
> 
> This vote sunk a bit, I'd like to draw some attention to this again in the
> hope I get some feedback or votes.
> 
> Thanks,
> Viktor
> 
> On Tue, May 7, 2019 at 4:28 PM Harsha <ka...@harsha.io> wrote:
> 
> > Thanks for the kip. LGTM +1.
> >
> > -Harsha
> >
> > On Mon, Apr 29, 2019, at 8:14 AM, Viktor Somogyi-Vass wrote:
> > > Hi Jason,
> > >
> > > I too agree this is more of a problem in older versions and therefore we
> > > could backport it. Were you thinking of any specific versions? I guess
> > the
> > > 2.x and 1.x versions are definitely targets here but I was thinking that
> > we
> > > might not want to further.
> > >
> > > Viktor
> > >
> > > On Mon, Apr 29, 2019 at 12:55 AM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Thanks for the work done, Viktor! +1 (non-binding)
> > > >
> > > > I strongly agree with Jason that this monitoring-focused KIP is worth
> > > > porting back to older versions. I am sure users will find it very
> > useful
> > > >
> > > > Best,
> > > > Stanislav
> > > >
> > > > On Fri, Apr 26, 2019 at 9:38 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Thanks, that works for me. +1
> > > > >
> > > > > By the way, we don't normally port KIPs to older releases, but I
> > wonder
> > > > if
> > > > > it's worth making an exception here. From recent experience, it
> > tends to
> > > > be
> > > > > the older versions that are more prone to fetcher failures. Thoughts?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Fri, Apr 26, 2019 at 5:18 AM Viktor Somogyi-Vass <
> > > > > viktorsomogyi@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Let me have a second thought, I'll just add the clientId instead to
> > > > > follow
> > > > > > the convention, so it'll change DeadFetcherThreadCount but with the
> > > > > > clientId tag.
> > > > > >
> > > > > > On Fri, Apr 26, 2019 at 11:29 AM Viktor Somogyi-Vass <
> > > > > > viktorsomogyi@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Jason,
> > > > > > >
> > > > > > > Yea I think it could make sense. In this case I would rename the
> > > > > > > DeadFetcherThreadCount to DeadReplicaFetcherThreadCount and
> > introduce
> > > > > the
> > > > > > > metric you're referring to as DeadLogDirFetcherThreadCount.
> > > > > > > I'll update the KIP to reflect this.
> > > > > > >
> > > > > > > Viktor
> > > > > > >
> > > > > > > On Thu, Apr 25, 2019 at 8:07 PM Jason Gustafson <
> > jason@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Viktor,
> > > > > > >>
> > > > > > >> This looks good. Just one question I had is whether we may as
> > well
> > > > > cover
> > > > > > >> the log dir fetchers as well.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Jason
> > > > > > >>
> > > > > > >>
> > > > > > >> On Thu, Apr 25, 2019 at 7:46 AM Viktor Somogyi-Vass <
> > > > > > >> viktorsomogyi@gmail.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi Folks,
> > > > > > >> >
> > > > > > >> > This thread sunk a bit but I'd like to bump it hoping to get
> > some
> > > > > > >> feedback
> > > > > > >> > and/or votes.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Viktor
> > > > > > >> >
> > > > > > >> > On Thu, Mar 28, 2019 at 8:47 PM Viktor Somogyi-Vass <
> > > > > > >> > viktorsomogyi@gmail.com>
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Sorry, the end of the message cut off.
> > > > > > >> > >
> > > > > > >> > > So I tried to be consistent with the convention in
> > LogManager,
> > > > > hence
> > > > > > >> the
> > > > > > >> > > hyphens and in AbstractFetcherManager, hence the camel
> > case. It
> > > > > > would
> > > > > > >> be
> > > > > > >> > > nice though to decide with one convention across the whole
> > > > > project,
> > > > > > >> > however
> > > > > > >> > > it requires a major refactor (especially for the components
> > that
> > > > > > >> leverage
> > > > > > >> > > metrics for monitoring).
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Viktor
> > > > > > >> > >
> > > > > > >> > > On Thu, Mar 28, 2019 at 8:44 PM Viktor Somogyi-Vass <
> > > > > > >> > > viktorsomogyi@gmail.com> wrote:
> > > > > > >> > >
> > > > > > >> > >> Hi Dhruvil,
> > > > > > >> > >>
> > > > > > >> > >> Thanks for the feedback and the vote. I fixed the typo in
> > the
> > > > > KIP.
> > > > > > >> > >> The naming is interesting though. Unfortunately kafka
> > overall
> > > > is
> > > > > > not
> > > > > > >> > >> consistent in metric naming but at least I tried to be
> > > > consistent
> > > > > > >> among
> > > > > > >> > the
> > > > > > >> > >> other metrics used in LogManager
> > > > > > >> > >>
> > > > > > >> > >> On Thu, Mar 28, 2019 at 7:32 PM Dhruvil Shah <
> > > > > dhruvil@confluent.io
> > > > > > >
> > > > > > >> > >> wrote:
> > > > > > >> > >>
> > > > > > >> > >>> Thanks for the KIP, Viktor! This is a useful addition. +1
> > > > > overall.
> > > > > > >> > >>>
> > > > > > >> > >>> Minor nits:
> > > > > > >> > >>> > I propose to add three gauge: DeadFetcherThreadCount
> > for the
> > > > > > >> fetcher
> > > > > > >> > >>> threads, log-cleaner-dead-thread-count for the log
> > cleaner.
> > > > > > >> > >>> I think you meant two instead of three.
> > > > > > >> > >>>
> > > > > > >> > >>> Also, would it make sense to name these metrics
> > consistency,
> > > > > > >> something
> > > > > > >> > >>> like
> > > > > > >> > >>> `log-cleaner-dead-thread-count` and
> > > > > > >> > `replica-fetcher-dead-thread-count`?
> > > > > > >> > >>>
> > > > > > >> > >>> Thanks,
> > > > > > >> > >>> Dhruvil
> > > > > > >> > >>>
> > > > > > >> > >>> On Thu, Mar 28, 2019 at 11:27 AM Viktor Somogyi-Vass <
> > > > > > >> > >>> viktorsomogyi@gmail.com> wrote:
> > > > > > >> > >>>
> > > > > > >> > >>> > Hi All,
> > > > > > >> > >>> >
> > > > > > >> > >>> > I'd like to start a vote on KIP-434.
> > > > > > >> > >>> > This basically would add a metrics to count dead
> > threads in
> > > > > > >> > >>> > ReplicaFetcherManager and LogCleaner to allow monitoring
> > > > > systems
> > > > > > >> to
> > > > > > >> > >>> alert
> > > > > > >> > >>> > based on this.
> > > > > > >> > >>> >
> > > > > > >> > >>> > The KIP link:
> > > > > > >> > >>> >
> > > > > > >> > >>> >
> > > > > > >> > >>>
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-434%3A+Add+Replica+Fetcher+and+Log+Cleaner+Count+Metrics
> > > > > > >> > >>> > The
> > > > > > >> > >>> > PR: https://github.com/apache/kafka/pull/6514
> > > > > > >> > >>> >
> > > > > > >> > >>> > I'd be happy to receive any votes or additional
> > > > > feedback/reviews
> > > > > > >> too.
> > > > > > >> > >>> >
> > > > > > >> > >>> > Thanks,
> > > > > > >> > >>> > Viktor
> > > > > > >> > >>> >
> > > > > > >> > >>>
> > > > > > >> > >>
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
>