You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Edoardo Comar (JIRA)" <ji...@apache.org> on 2017/01/06 16:16:58 UTC

[jira] [Comment Edited] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

    [ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804874#comment-15804874 ] 

Edoardo Comar edited comment on KAFKA-4441 at 1/6/17 4:16 PM:
--------------------------------------------------------------

For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside {{ReplicaManager}} needs to be able to make a check like {{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} 

The current startup ordering inside {{KafkaServer}} has the {{ReplicaManager}} start before the {{KafkaController}}. 
Could the order be reversed ?
Else the {{ReplicaManager}} could be assigned a {{DeletionChecker}} function after the {{KafkaController}} has started. This would be minimally disruptive to the current code.

[~ijuma] [~junrao] any preferences ?


was (Author: ecomar):
For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside {{ReplicaManager}} needs to be able to make a check {{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} 
We will follow up with another PR

> Kafka Monitoring is incorrect during rapid topic creation and deletion
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-4441
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4441
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.0.0, 0.10.0.1
>            Reporter: Tom Crayford
>            Assignee: Edoardo Comar
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a tight loop, although the actual causes of the metrics firing are from topics that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous state machine: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 
> I believe the fix to this is relatively simple - we need to make the metrics know that a topic is currently undergoing deletion or creation, and only include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)