You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by José Armando García Sancio <js...@confluent.io.INVALID> on 2022/05/16 15:59:09 UTC

[VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Hi all,

I would like to start a vote for KIP-835:
https://cwiki.apache.org/confluence/x/0xShD

Thanks you,
-José

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by José Armando García Sancio <js...@confluent.io.INVALID>.

Thanks everyone for your feedback and help. KIP-835 was approved with
3 binding votes from Guozhang, Luke and David.

On Thu, May 19, 2022 at 10:21 AM Guozhang Wang <wa...@gmail.com> wrote:
>
> That makes sense. Thanks!
>
> +1 (binding).
>
> On Thu, May 19, 2022 at 8:46 AM José Armando García Sancio
> <js...@confluent.io.invalid> wrote:
>
> > Guozhang Wang wrote:
> > >
> > > Thanks José! For 1/2 above, just checking if we would record the
> > > corresponding sensors only during broker bootstrap time, or whenever
> > there
> > > are new metadata records being committed by the controller quorum (since
> > > there are always a short period of time, between when the records are
> > > committed, to when the records get fetched by that broker)?
> >
> > It measures the time spent by the broker keeping up with the log or
> > processing the log. In practicality, when the broker starts that would
> > be the time spent loading the snapshot and processing the committed
> > records in the log after the snapshot. After startup that would be the
> > time spent reading the log to keep up with the local high-watermark. I
> > changed the name of the metric to "pending-record-processing-time-us".
> > I think the word "load" in the previous metric name was awkward and
> > misleading.
> >
> > --
> > -José
> >
>
>
> --
> -- Guozhang



-- 
-José

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Guozhang Wang <wa...@gmail.com>.

That makes sense. Thanks!

+1 (binding).

On Thu, May 19, 2022 at 8:46 AM José Armando García Sancio
<js...@confluent.io.invalid> wrote:

> Guozhang Wang wrote:
> >
> > Thanks José! For 1/2 above, just checking if we would record the
> > corresponding sensors only during broker bootstrap time, or whenever
> there
> > are new metadata records being committed by the controller quorum (since
> > there are always a short period of time, between when the records are
> > committed, to when the records get fetched by that broker)?
>
> It measures the time spent by the broker keeping up with the log or
> processing the log. In practicality, when the broker starts that would
> be the time spent loading the snapshot and processing the committed
> records in the log after the snapshot. After startup that would be the
> time spent reading the log to keep up with the local high-watermark. I
> changed the name of the metric to "pending-record-processing-time-us".
> I think the word "load" in the previous metric name was awkward and
> misleading.
>
> --
> -José
>


-- 
-- Guozhang

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by José Armando García Sancio <js...@confluent.io.INVALID>.

Guozhang Wang wrote:
>
> Thanks José! For 1/2 above, just checking if we would record the
> corresponding sensors only during broker bootstrap time, or whenever there
> are new metadata records being committed by the controller quorum (since
> there are always a short period of time, between when the records are
> committed, to when the records get fetched by that broker)?

It measures the time spent by the broker keeping up with the log or
processing the log. In practicality, when the broker starts that would
be the time spent loading the snapshot and processing the committed
records in the log after the snapshot. After startup that would be the
time spent reading the log to keep up with the local high-watermark. I
changed the name of the metric to "pending-record-processing-time-us".
I think the word "load" in the previous metric name was awkward and
misleading.

-- 
-José

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Guozhang Wang <wa...@gmail.com>.

Thanks José! For 1/2 above, just checking if we would record the
corresponding sensors only during broker bootstrap time, or whenever there
are new metadata records being committed by the controller quorum (since
there are always a short period of time, between when the records are
committed, to when the records get fetched by that broker)?

On Wed, May 18, 2022 at 7:53 PM Luke Chen <sh...@gmail.com> wrote:

> Hi José,
>
> Thanks for the KIP!
> +1 (binding) from me.
>
> Luke
>
> On Thu, May 19, 2022 at 2:05 AM José Armando García Sancio
> <js...@confluent.io.invalid> wrote:
>
> > Hi Guozhang, thanks for the feedback.
> >
> >
> > Guozhang wrote:
> > > Could you elaborate a bit on what does "load-processing-time-us"
> > measure? I
> > > looked through the discussion thread and the KIP / JIRA but cannot find
> > its
> > > definitions.
> >
> > Yes. I updated the KIP. This is what I documented:
> > 1.
> >
> kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-avg
> > Reports the average amount of time it took for the broker to process
> > all pending records when there are pending records in the cluster
> > metadata partition. The time unit for this metric is microseconds.
> > 2.
> >
> kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-max
> > Reports the maximum amount of time it took for the broker to process
> > all pending records when there are pending records in the cluster
> > metadata partition. The time unit for this metric is microseconds.
> > 3.
> > kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-avg
> > Reports the average byte size of the record batches in the cluster
> > metadata partition.
> > 4.
> > kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-max
> > Reports the maximum byte size of the record batches in the cluster
> > metadata partition.
> >
> > -José
> >
>


-- 
-- Guozhang

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Luke Chen <sh...@gmail.com>.

Hi José,

Thanks for the KIP!
+1 (binding) from me.

Luke

On Thu, May 19, 2022 at 2:05 AM José Armando García Sancio
<js...@confluent.io.invalid> wrote:

> Hi Guozhang, thanks for the feedback.
>
>
> Guozhang wrote:
> > Could you elaborate a bit on what does "load-processing-time-us"
> measure? I
> > looked through the discussion thread and the KIP / JIRA but cannot find
> its
> > definitions.
>
> Yes. I updated the KIP. This is what I documented:
> 1.
> kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-avg
> Reports the average amount of time it took for the broker to process
> all pending records when there are pending records in the cluster
> metadata partition. The time unit for this metric is microseconds.
> 2.
> kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-max
> Reports the maximum amount of time it took for the broker to process
> all pending records when there are pending records in the cluster
> metadata partition. The time unit for this metric is microseconds.
> 3.
> kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-avg
> Reports the average byte size of the record batches in the cluster
> metadata partition.
> 4.
> kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-max
> Reports the maximum byte size of the record batches in the cluster
> metadata partition.
>
> -José
>

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by José Armando García Sancio <js...@confluent.io.INVALID>.

Hi Guozhang, thanks for the feedback.

Guozhang wrote:
> Could you elaborate a bit on what does "load-processing-time-us" measure? I
> looked through the discussion thread and the KIP / JIRA but cannot find its
> definitions.

Yes. I updated the KIP. This is what I documented:
1. kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-avg
Reports the average amount of time it took for the broker to process
all pending records when there are pending records in the cluster
metadata partition. The time unit for this metric is microseconds.
2. kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-max
Reports the maximum amount of time it took for the broker to process
all pending records when there are pending records in the cluster
metadata partition. The time unit for this metric is microseconds.
3. kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-avg
Reports the average byte size of the record batches in the cluster
metadata partition.
4. kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-max
Reports the maximum byte size of the record batches in the cluster
metadata partition.

-José

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Guozhang Wang <wa...@gmail.com>.

Hello José,

Could you elaborate a bit on what does "load-processing-time-us" measure? I
looked through the discussion thread and the KIP / JIRA but cannot find its
definitions.

Guozhang

On Mon, May 16, 2022 at 1:30 PM David Arthur <mu...@gmail.com> wrote:

> Thanks José, the KIP looks good to me!
>
> +1 binding
>
> -David
>
> On Mon, May 16, 2022 at 11:59 AM José Armando García Sancio
> <js...@confluent.io.invalid> wrote:
>
> > Hi all,
> >
> > I would like to start a vote for KIP-835:
> > https://cwiki.apache.org/confluence/x/0xShD
> >
> > Thanks you,
> > -José
> >
>
>
> --
> David Arthur
>

-- 
-- Guozhang

Re: [VOTE] KIP-835: Monitor KRaft Controller Quorum Health

Posted by David Arthur <mu...@gmail.com>.

Thanks José, the KIP looks good to me!

+1 binding

-David

On Mon, May 16, 2022 at 11:59 AM José Armando García Sancio
<js...@confluent.io.invalid> wrote:

> Hi all,
>
> I would like to start a vote for KIP-835:
> https://cwiki.apache.org/confluence/x/0xShD
>
> Thanks you,
> -José
>

-- 
David Arthur