You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Kevin Lu (JIRA)" <ji...@apache.org> on 2018/10/12 23:15:00 UTC

[jira] [Updated] (KAFKA-7236) Add --under-min-isr option to describe topics command

     [ https://issues.apache.org/jira/browse/KAFKA-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Lu updated KAFKA-7236:
----------------------------
    Description: 
The "min.insync.replicas" configuration specifies the minimum number of insync replicas required for a partition to accept messages from the producer. If the insync replica count of a partition falls under the specified "min.insync.replicas", then the broker will reject messages for producers using acks=all. These producers will suffer unavailability as they will see a NotEnoughReplicas or NotEnoughReplicasAfterAppend exception.

We currently have an UnderMinIsrPartitionCount metric which is useful for identifying when partitions fall under "min.insync.replicas", however it is still difficult to identify which topic partitions are affected and need fixing.

We can leverage the describe topics command in TopicCommand to add an option "--under-minisr-partitions" to list out exactly which topic partitions are below "min.insync.replicas".

  was:
[KIP-351|https://cwiki.apache.org/confluence/display/KAFKA/KIP-351%3A+Add+--critical-partitions+option+to+describe+topics+command]

 

A topic partition can be in one of four states (assuming replication factor of 3):

 

(ISR = in sync replica)

 

3/3 ISRs: OK

2/3 ISRs: WARNING (under-replicated partition)

1/3 ISRs: CRITICAL (under-replicated partition)

0/3 ISRs: FATAL (offline/unavailable partition)

 

TopicCommand already has the --under-replicated-partitions and --unavailable-partitions flags, but it would be beneficial to include an additional --critical-partitions option that specifically lists out partitions in CRITICAL state (only one remaining ISR left).

 

With this new option, Kafka users can use this option to identify the exact topic partitions that are critical and need immediate repartitioning. Kafka users can also set up critical alerts to trigger when the output of this command contains partitions.

 

A couple cases where identifying this CRITICAL state is useful in alerting:
 * Users that have a large amount of topics in a single cluster, making it incredibly hard to manually repartition all topics that have under-replicated partitions, so they only take action when it hits CRITICAL state
 * Users with a high replication-factor that can tolerate some broker failures and only take action when it hits CRITICAL state

        Summary: Add --under-min-isr option to describe topics command  (was: Add --critical-partitions option to describe topics command)

> Add --under-min-isr option to describe topics command
> -----------------------------------------------------
>
>                 Key: KAFKA-7236
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7236
>             Project: Kafka
>          Issue Type: Improvement
>          Components: tools
>            Reporter: Kevin Lu
>            Assignee: Kevin Lu
>            Priority: Minor
>
> The "min.insync.replicas" configuration specifies the minimum number of insync replicas required for a partition to accept messages from the producer. If the insync replica count of a partition falls under the specified "min.insync.replicas", then the broker will reject messages for producers using acks=all. These producers will suffer unavailability as they will see a NotEnoughReplicas or NotEnoughReplicasAfterAppend exception.
> We currently have an UnderMinIsrPartitionCount metric which is useful for identifying when partitions fall under "min.insync.replicas", however it is still difficult to identify which topic partitions are affected and need fixing.
> We can leverage the describe topics command in TopicCommand to add an option "--under-minisr-partitions" to list out exactly which topic partitions are below "min.insync.replicas".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)