You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by José Armando García Sancio <js...@confluent.io.INVALID> on 2022/05/06 19:02:02 UTC

[DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Hi all,

I created a KIP for adding a mechanism to monitor the health of the
KRaft Controller quorum through metrics. See KIP-835:
https://cwiki.apache.org/confluence/x/0xShD

Thanks for your feedback,
-José

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Jun Rao <ju...@confluent.io.INVALID>.

Hi, Jose,

Thanks for the reply.

20. I see the differences now. The metrics in KafkaController use Yammer
metric and follow the camel case naming. The metrics in Raft use the client
side Metrics package and follow the dash notation. So the naming in the KIP
sounds good to me.

21. Sounds good.

Jun



On Wed, May 18, 2022 at 2:11 PM José Armando García Sancio
<js...@confluent.io.invalid> wrote:

> Hi Jun,
>
> Jun wrote:
> > 20. For the metric type and name, we use the camel names in some cases
> and
> > dashed lower names in some other cases. Should we make them consistent?
>
> For the metrics group `type=KafkaController`, I am using camel names
> like `MetadataLastAppliedRecordOffset` because it matches the naming
> strategy for the metrics already in that group.
>
> For the metrics group `type=broker-metadata-metrics`, I model the
> naming scheme after the metrics in KIP-595 or the `raft-metrics`
> group. I made the assumption that we wanted to start naming metrics
> and groups using that scheme but maybe that is not correct.
>
> What do you think?
>
> > 21. Could you document the meaning of load-processing-time?
>
> I updated KIP-835 to include this information. Here is a summary:
>
> 1.
> kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-avg
> Reports the average amount of time it took for the broker to process
> all pending records when there are pending records in the cluster
> metadata partition. The time unit for this metric is microseconds.
> 2.
> kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-max
> Reports the maximum amount of time it took for the broker to process
> all pending records when there are pending records in the cluster
> metadata partition. The time unit for this metric is microseconds.
> 3.
> kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-avg
> Reports the average byte size of the record batches in the cluster
> metadata partition.
> 4.
> kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-max
> Reports the maximum byte size of the record batches in the cluster
> metadata partition.
>
> -José
>

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by José Armando García Sancio <js...@confluent.io.INVALID>.

Hi Jun,

Jun wrote:
> 20. For the metric type and name, we use the camel names in some cases and
> dashed lower names in some other cases. Should we make them consistent?

For the metrics group `type=KafkaController`, I am using camel names
like `MetadataLastAppliedRecordOffset` because it matches the naming
strategy for the metrics already in that group.

For the metrics group `type=broker-metadata-metrics`, I model the
naming scheme after the metrics in KIP-595 or the `raft-metrics`
group. I made the assumption that we wanted to start naming metrics
and groups using that scheme but maybe that is not correct.

What do you think?

> 21. Could you document the meaning of load-processing-time?

I updated KIP-835 to include this information. Here is a summary:

1. kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-avg
Reports the average amount of time it took for the broker to process
all pending records when there are pending records in the cluster
metadata partition. The time unit for this metric is microseconds.
2. kafka.server:type=broker-metadata-metrics,name=load-processing-time-us-max
Reports the maximum amount of time it took for the broker to process
all pending records when there are pending records in the cluster
metadata partition. The time unit for this metric is microseconds.
3. kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-avg
Reports the average byte size of the record batches in the cluster
metadata partition.
4. kafka.server:type=broker-metadata-metrics,name=record-batch-size-byte-max
Reports the maximum byte size of the record batches in the cluster
metadata partition.

-José

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Jun Rao <ju...@confluent.io.INVALID>.

Hi, Jose,

Thanks for the KIP. Just a couple of minor comments.

20. For the metric type and name, we use the camel names in some cases and
dashed lower names in some other cases. Should we make them consistent?

21. Could you document the meaning of load-processing-time?

Thanks,

Jun

On Mon, May 16, 2022 at 9:01 AM José Armando García Sancio
<js...@confluent.io.invalid> wrote:

> Hi all,
>
> Thanks for your feedback. I started a voting thread here:
> https://lists.apache.org/thread/x1cy5otpf7mj9ytghnktr5hog27hdf7k
>

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by José Armando García Sancio <js...@confluent.io.INVALID>.

Hi all,

Thanks for your feedback. I started a voting thread here:
https://lists.apache.org/thread/x1cy5otpf7mj9ytghnktr5hog27hdf7k

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by José Armando García Sancio <js...@confluent.io.INVALID>.

Thanks for all of the feedback. Some comments below:

Luke wrote:
> 1. Jason has asked but you didn't answer: What is the default value for `
> metadata.monitor.write.interval.ms`?

Thanks for asking again. Looks like I missed this in my previous
reply. In the implementation I am currently working on, I have it at 1
second. Colin is suggesting 500ms. I should point out that the fetch
timeout default is currently 2 seconds. I'll go with Colin's
suggestion and make it 500ms.

> 2. The `noopRecord` API key is `TBD`. Why can't we put the "currently used
> API Key nums + 1" into it? Any concern?

Yes. It will be the highest currently used API key for metadata
records plus one. I didn't specify it yet since the code hasn't been
committed. I usually update the KIP once the code is committed to
trunk.

> 3. typo: name=MetadataWriteOffse[t]s

Fixed.

> 4. I don't understand the difference between MetadataLastCommittedOffset
> and MetadataWriteOffsets metrics. I think the 2 values will always be
> dentical, unless the controller metadata write failed, is that correct?

The active controller tracks two offsets, the writeOffset and the
lastCommittedOffset. The writeOffset is the largest offset that was
sent to the KRaft layer to get committed. The lastCommittedOffset is
the largest offset that the controller knows got committed. The
controller increases the lastCommittedOffset as the KRaft layer
replicates records, commits records and finally the controller handles
the committed record. Colin suggested calling this the
last-applied-offset-*`. I agree with this rename as it more accurately
represents the state being reported.

I updated the description of those metrics. Please take a look.

David wrote:
> 1. Based on the config name "metadata.monitor.write.interval.ms" I'm
> guessing the intention is to have a regularly scheduled write. If the
> quorum is busy with lots of the writes, we wouldn't need this NoopRecord
> ight? Maybe we can instead write a NoopRecord only after some amount of
> idle time.

Correct. I suspect that the first implementation will always write
this record in the interval configured. Future implementation can
optimize this and only write the NoopRecord if the active controller
didn't already write a record in this time interval.

> 2. A small naming suggestion, what about "NoOpRecord"?

Sounds good to me. I changed the record to NoOpRecord.

> 4. We should consider omitting these records from the log dump tool, or at
> least adding an option to skip over them.

Thanks for pointing this out. DumpLogSegments uses MetadataRecordSerde
so it should just work as long as it is the same software version as
the Kafka node. The issue is if the user uses an old DumpLogSegments
on a metadata log that supports this feature.

I updated the "Compatibility, Deprecation, and Migration Plan" section
to document this

> 5. If (somehow) one of these NoopRecord appeared in the snapshot, it sounds
> like the broker/controller would skip over them. We should specify this
> behavior in the KIP

Excluding bugs they should not be part of the metadata snapshot
because the controllers will not store this information in the
"timeline" data types and the brokers will not include this
information in the "metadata image". In other words both the
controller and brokers will skip this record when replaying the log
and snapshot. Both `handleCommit` and `handleSnapshot` eventually
execute the same "replay" code.

Colin wrote:
> I had the same question as Luke and Jason: what's the default here for the NoOpRecord time? :) We should add a value here even if we think we'll adjust it later, just to give a feeling for how much traffic this would create. Perhaps 500 ms?

I agree. I updated the KIP to document this as 500ms.

> Also how about naming the time configuration something like "metadata.max.idle.interval.ms", to make it clear that this is the maximum time we can go between writes to the metadata log. I don't get that meaning out of "monitor interval"...

I like this suggestion. I think this name makes it less implementation
specific and allows us to not write NoOpRecords if the active
controller already wrote a record in the configured time interval.

> I'd suggest renaming this as "last-applied-offset-*" to make it clear that the offset we're measuring is the last one that the broker applied. The last committed offset is something else, a more log-layer-centric concept.

I agree. I change the KIP to prefix these metrics with "last applied
record ...".

> (It would also be good to have a separate metric which purely reflected the last metadata offset we've fetched and not applied, so we can see if the delta between that and last-applied-offset increases...)

In KIP-595 we have this metric which reports this information:
1. log-end-offset, type=raft-manager, dynamic gauge

Here are the changes to the KIP:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=6&selectedPageVersions=5

Thanks for the feedback everyone. Hopefully this time I didn't miss
any question or suggestion,
-José

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Colin McCabe <cm...@apache.org>.

Hi José,

Thanks for the KIP! I think this will be a nice improvement.

I had the same question as Luke and Jason: what's the default here for the NoOpRecord time? :) We should add a value here even if we think we'll adjust it later, just to give a feeling for how much traffic this would create. Perhaps 500 ms?

Also how about naming the time configuration something like "metadata.max.idle.interval.ms", to make it clear that this is the maximum time we can go between writes to the metadata log. I don't get that meaning out of "monitor interval"...

> Broker
> kafka.server:type=broker-metadata-metrics,name=last-committed-offset 
> kafka.server:type=broker-metadata-metrics,name=last-committed-timestamp 
> kafka.server:type=broker-metadata-metrics,name=last-committed-lag-ms

I'd suggest renaming this as "last-applied-offset-*" to make it clear that the offset we're measuring is the last one that the broker applied. The last committed offset is something else, a more log-layer-centric concept.

(It would also be good to have a separate metric which purely reflected the last metadata offset we've fetched and not applied, so we can see if the delta between that and last-applied-offset increases...)

regards,
Colin


On Wed, May 11, 2022, at 12:27, David Arthur wrote:
> José, thanks for the KIP! I think this is a good approach for proving
> the liveness of the quorum when metadata is not changing.
>
> 1. Based on the config name "metadata.monitor.write.interval.ms" I'm
> guessing the intention is to have a regularly scheduled write. If the
> quorum is busy with lots of the writes, we wouldn't need this NoopRecord
> right? Maybe we can instead write a NoopRecord only after some amount of
> idle time.
>
> 2. A small naming suggestion, what about "NoOpRecord"?
>
> 3. Typo in one of the metric names: "MetadataWriteOffses"
>
> 4. We should consider omitting these records from the log dump tool, or at
> least adding an option to skip over them.
>
> 5. If (somehow) one of these NoopRecord appeared in the snapshot, it sounds
> like the broker/controller would skip over them. We should specify this
> behavior in the KIP
>
> Cheers,
> David
>
> On Wed, May 11, 2022 at 2:37 AM Luke Chen <sh...@gmail.com> wrote:
>
>> Hi José,
>>
>> Thanks for the KIP!
>>
>> Some questions:
>> 1. Jason has asked but you didn't answer: What is the default value for `
>> metadata.monitor.write.interval.ms`?
>> 2. The `noopRecord` API key is `TBD`. Why can't we put the "currently used
>> API Key nums + 1" into it? Any concern?
>> 3. typo: name=MetadataWriteOffse[t]s
>> 4. I don't understand the difference between MetadataLastCommittedOffset
>> and MetadataWriteOffsets metrics. I think the 2 values will always be
>> identical, unless the controller metadata write failed, is that correct?
>>
>>
>> Thank you.
>> Luke
>>
>> On Wed, May 11, 2022 at 5:58 AM José Armando García Sancio
>> <js...@confluent.io.invalid> wrote:
>>
>> > Thanks for your feedback Jason, much appreciated.
>> >
>> > Here are the changes to the KIP:
>> >
>> >
>> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=5&selectedPageVersions=4
>> >
>> > On Tue, May 10, 2022 at 1:34 PM Jason Gustafson
>> > <ja...@confluent.io.invalid> wrote:
>> > > The approach sounds reasonable. By the way, I think one of the gaps we
>> > have
>> > > today is when the leader gets partitioned from the remaining voters. I
>> > > believe it continues acting as a leader indefinitely. I was considering
>> > > whether this periodic write can address the issue. Basically it can be
>> > used
>> > > to force a leader to prove it is still the leader by committing some
>> > data.
>> > > Say, for example, that the leader fails to commit the record after the
>> > > fetch timeout expires, then perhaps it could start a new election. What
>> > do
>> > > you think?
>> >
>> > We have an issue for this at
>> > https://issues.apache.org/jira/browse/KAFKA-13621. I updated the issue
>> > with your feedback and included some of my thoughts. Do you mind if we
>> > move this conversation to that issue?
>> >
>> > > A couple additional questions:
>> > >
>> > > - What is the default value for `metadata.monitor.write.interval.ms`?
>> > Also,
>> > > I'm wondering if `controller` would be a more suitable prefix?
>> >
>> > Yeah. I am not sure. Looking at the current configuration we have both
>> > prefixes. For example, with the `controller` prefix we have
>> > `controller.quorum.voters`, `controller.listener.names`,
>> > `controller.quota.window.num`, etc. For the `metadata` prefix we have
>> > `metadata.log.dir`, `metadata.log.*` and `metadat.max.retention.ms`,
>> > etc.
>> > I get the impression that we use `metadata` for things that are kinda
>> > log/disk related and `controller` for things that are not. I am
>> > thinking that the `metadata` prefix is more consistent with the
>> > current situation. What do you think Jason?
>> >
>> > > - Could we avoid letting BrokerMetadataPublisher escape into the metric
>> > > name? Letting the classnames leak into the metrics tends to cause
>> > > compatibility issues over time.
>> >
>> > Good point. For Raft we use `kafka.server:type=raft-metrics,name=...`.
>> > I'll change it to
>> > `kafka.server:type=broker-metadata-metrics,name=...`.
>> >
>> > Thanks,
>> > -José
>> >
>>
>
>
> -- 
> David Arthur

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by David Arthur <mu...@gmail.com>.

José, thanks for the KIP! I think this is a good approach for proving
the liveness of the quorum when metadata is not changing.

1. Based on the config name "metadata.monitor.write.interval.ms" I'm
guessing the intention is to have a regularly scheduled write. If the
quorum is busy with lots of the writes, we wouldn't need this NoopRecord
right? Maybe we can instead write a NoopRecord only after some amount of
idle time.

2. A small naming suggestion, what about "NoOpRecord"?

3. Typo in one of the metric names: "MetadataWriteOffses"

4. We should consider omitting these records from the log dump tool, or at
least adding an option to skip over them.

5. If (somehow) one of these NoopRecord appeared in the snapshot, it sounds
like the broker/controller would skip over them. We should specify this
behavior in the KIP

Cheers,
David

On Wed, May 11, 2022 at 2:37 AM Luke Chen <sh...@gmail.com> wrote:

> Hi José,
>
> Thanks for the KIP!
>
> Some questions:
> 1. Jason has asked but you didn't answer: What is the default value for `
> metadata.monitor.write.interval.ms`?
> 2. The `noopRecord` API key is `TBD`. Why can't we put the "currently used
> API Key nums + 1" into it? Any concern?
> 3. typo: name=MetadataWriteOffse[t]s
> 4. I don't understand the difference between MetadataLastCommittedOffset
> and MetadataWriteOffsets metrics. I think the 2 values will always be
> identical, unless the controller metadata write failed, is that correct?
>
>
> Thank you.
> Luke
>
> On Wed, May 11, 2022 at 5:58 AM José Armando García Sancio
> <js...@confluent.io.invalid> wrote:
>
> > Thanks for your feedback Jason, much appreciated.
> >
> > Here are the changes to the KIP:
> >
> >
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=5&selectedPageVersions=4
> >
> > On Tue, May 10, 2022 at 1:34 PM Jason Gustafson
> > <ja...@confluent.io.invalid> wrote:
> > > The approach sounds reasonable. By the way, I think one of the gaps we
> > have
> > > today is when the leader gets partitioned from the remaining voters. I
> > > believe it continues acting as a leader indefinitely. I was considering
> > > whether this periodic write can address the issue. Basically it can be
> > used
> > > to force a leader to prove it is still the leader by committing some
> > data.
> > > Say, for example, that the leader fails to commit the record after the
> > > fetch timeout expires, then perhaps it could start a new election. What
> > do
> > > you think?
> >
> > We have an issue for this at
> > https://issues.apache.org/jira/browse/KAFKA-13621. I updated the issue
> > with your feedback and included some of my thoughts. Do you mind if we
> > move this conversation to that issue?
> >
> > > A couple additional questions:
> > >
> > > - What is the default value for `metadata.monitor.write.interval.ms`?
> > Also,
> > > I'm wondering if `controller` would be a more suitable prefix?
> >
> > Yeah. I am not sure. Looking at the current configuration we have both
> > prefixes. For example, with the `controller` prefix we have
> > `controller.quorum.voters`, `controller.listener.names`,
> > `controller.quota.window.num`, etc. For the `metadata` prefix we have
> > `metadata.log.dir`, `metadata.log.*` and `metadat.max.retention.ms`,
> > etc.
> > I get the impression that we use `metadata` for things that are kinda
> > log/disk related and `controller` for things that are not. I am
> > thinking that the `metadata` prefix is more consistent with the
> > current situation. What do you think Jason?
> >
> > > - Could we avoid letting BrokerMetadataPublisher escape into the metric
> > > name? Letting the classnames leak into the metrics tends to cause
> > > compatibility issues over time.
> >
> > Good point. For Raft we use `kafka.server:type=raft-metrics,name=...`.
> > I'll change it to
> > `kafka.server:type=broker-metadata-metrics,name=...`.
> >
> > Thanks,
> > -José
> >
>


-- 
David Arthur

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Luke Chen <sh...@gmail.com>.

Hi José,

Thanks for the KIP!

Some questions:
1. Jason has asked but you didn't answer: What is the default value for `
metadata.monitor.write.interval.ms`?
2. The `noopRecord` API key is `TBD`. Why can't we put the "currently used
API Key nums + 1" into it? Any concern?
3. typo: name=MetadataWriteOffse[t]s
4. I don't understand the difference between MetadataLastCommittedOffset
and MetadataWriteOffsets metrics. I think the 2 values will always be
identical, unless the controller metadata write failed, is that correct?


Thank you.
Luke

On Wed, May 11, 2022 at 5:58 AM José Armando García Sancio
<js...@confluent.io.invalid> wrote:

> Thanks for your feedback Jason, much appreciated.
>
> Here are the changes to the KIP:
>
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=5&selectedPageVersions=4
>
> On Tue, May 10, 2022 at 1:34 PM Jason Gustafson
> <ja...@confluent.io.invalid> wrote:
> > The approach sounds reasonable. By the way, I think one of the gaps we
> have
> > today is when the leader gets partitioned from the remaining voters. I
> > believe it continues acting as a leader indefinitely. I was considering
> > whether this periodic write can address the issue. Basically it can be
> used
> > to force a leader to prove it is still the leader by committing some
> data.
> > Say, for example, that the leader fails to commit the record after the
> > fetch timeout expires, then perhaps it could start a new election. What
> do
> > you think?
>
> We have an issue for this at
> https://issues.apache.org/jira/browse/KAFKA-13621. I updated the issue
> with your feedback and included some of my thoughts. Do you mind if we
> move this conversation to that issue?
>
> > A couple additional questions:
> >
> > - What is the default value for `metadata.monitor.write.interval.ms`?
> Also,
> > I'm wondering if `controller` would be a more suitable prefix?
>
> Yeah. I am not sure. Looking at the current configuration we have both
> prefixes. For example, with the `controller` prefix we have
> `controller.quorum.voters`, `controller.listener.names`,
> `controller.quota.window.num`, etc. For the `metadata` prefix we have
> `metadata.log.dir`, `metadata.log.*` and `metadat.max.retention.ms`,
> etc.
> I get the impression that we use `metadata` for things that are kinda
> log/disk related and `controller` for things that are not. I am
> thinking that the `metadata` prefix is more consistent with the
> current situation. What do you think Jason?
>
> > - Could we avoid letting BrokerMetadataPublisher escape into the metric
> > name? Letting the classnames leak into the metrics tends to cause
> > compatibility issues over time.
>
> Good point. For Raft we use `kafka.server:type=raft-metrics,name=...`.
> I'll change it to
> `kafka.server:type=broker-metadata-metrics,name=...`.
>
> Thanks,
> -José
>

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by José Armando García Sancio <js...@confluent.io.INVALID>.

Thanks for your feedback Jason, much appreciated.

Here are the changes to the KIP:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=5&selectedPageVersions=4

On Tue, May 10, 2022 at 1:34 PM Jason Gustafson
<ja...@confluent.io.invalid> wrote:
> The approach sounds reasonable. By the way, I think one of the gaps we have
> today is when the leader gets partitioned from the remaining voters. I
> believe it continues acting as a leader indefinitely. I was considering
> whether this periodic write can address the issue. Basically it can be used
> to force a leader to prove it is still the leader by committing some data.
> Say, for example, that the leader fails to commit the record after the
> fetch timeout expires, then perhaps it could start a new election. What do
> you think?

We have an issue for this at
https://issues.apache.org/jira/browse/KAFKA-13621. I updated the issue
with your feedback and included some of my thoughts. Do you mind if we
move this conversation to that issue?

> A couple additional questions:
>
> - What is the default value for `metadata.monitor.write.interval.ms`? Also,
> I'm wondering if `controller` would be a more suitable prefix?

Yeah. I am not sure. Looking at the current configuration we have both
prefixes. For example, with the `controller` prefix we have
`controller.quorum.voters`, `controller.listener.names`,
`controller.quota.window.num`, etc. For the `metadata` prefix we have
`metadata.log.dir`, `metadata.log.*` and `metadat.max.retention.ms`,
etc.
I get the impression that we use `metadata` for things that are kinda
log/disk related and `controller` for things that are not. I am
thinking that the `metadata` prefix is more consistent with the
current situation. What do you think Jason?

> - Could we avoid letting BrokerMetadataPublisher escape into the metric
> name? Letting the classnames leak into the metrics tends to cause
> compatibility issues over time.

Good point. For Raft we use `kafka.server:type=raft-metrics,name=...`.
I'll change it to
`kafka.server:type=broker-metadata-metrics,name=...`.

Thanks,
-José

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Posted by Jason Gustafson <ja...@confluent.io.INVALID>.

Hi Jose,

Thanks for the KIP.

The approach sounds reasonable. By the way, I think one of the gaps we have
today is when the leader gets partitioned from the remaining voters. I
believe it continues acting as a leader indefinitely. I was considering
whether this periodic write can address the issue. Basically it can be used
to force a leader to prove it is still the leader by committing some data.
Say, for example, that the leader fails to commit the record after the
fetch timeout expires, then perhaps it could start a new election. What do
you think?

A couple additional questions:

- What is the default value for `metadata.monitor.write.interval.ms`? Also,
I'm wondering if `controller` would be a more suitable prefix?
- Could we avoid letting BrokerMetadataPublisher escape into the metric
name? Letting the classnames leak into the metrics tends to cause
compatibility issues over time.

Best,
Jason

On Fri, May 6, 2022 at 12:02 PM José Armando García Sancio
<js...@confluent.io.invalid> wrote:

> Hi all,
>
> I created a KIP for adding a mechanism to monitor the health of the
> KRaft Controller quorum through metrics. See KIP-835:
> https://cwiki.apache.org/confluence/x/0xShD
>
> Thanks for your feedback,
> -José
>