You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jiao Zhang (Jira)" <ji...@apache.org> on 2020/02/27 07:12:00 UTC

[jira] [Created] (KAFKA-9616) Add new metrics to get total response time with throttle time subtracted

Jiao Zhang created KAFKA-9616:
---------------------------------

             Summary: Add new metrics to get total response time with throttle time subtracted
                 Key: KAFKA-9616
                 URL: https://issues.apache.org/jira/browse/KAFKA-9616
             Project: Kafka
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.1.0
            Reporter: Jiao Zhang


We are using these RequestMetrics for our cluster monitoring [https://github.com/apache/kafka/blob/fb5bd9eb7cdfdae8ed1ea8f68e9be5687f610b28/core/src/main/scala/kafka/network/RequestChannel.scala#L364]

and config our AlertManager to fire alerts if 99th value of 'TotalTimeMs' exceeds the threshold value. This alert is very important as it really notifies cluster administrators the bad situation for example when one server is bailed out from cluster or lost leadership.

But we suffer from false alerts sometimes. This is the case. We set quota like 'producer_byte_rate' for some clients, so when requests from these clients are throttled, 'ThrottleTimeMs' is long and sometimes due to throttle 'TotalTimeMs' exceeds the threshold value and alert is triggered. As a result we have to put some time to check details for false alerts either.

So this ticket proposes to add a new metrics 'ProcessTimeMs', the value of which is total response time with throttle time subtracted. This metrics is more accurate and could help us only notice the really unexpected situation.

Btw, we tried to achieve this by using PromQL against existing metrics, like Total - Throttle. But it does not work as it seems these two metrics are inconsistent in time. So better to expose a new metrics from broker side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)