You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Mathieu Gaudin (Jira)" <ji...@apache.org> on 2023/02/08 09:21:00 UTC
[jira] [Resolved] (ZOOKEEPER-4358) Latency metrics showing surprising results for a keberos-enabled cluster

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Gaudin resolved ZOOKEEPER-4358.
---------------------------------------
    Resolution: Not A Problem

> Latency metrics showing surprising results for a keberos-enabled cluster
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4358
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4358
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: metric system
>    Affects Versions: 3.6.2
>            Reporter: Mathieu Gaudin
>            Priority: Minor
>         Attachments: image-2021-08-27-16-10-28-783.png, image-2021-08-27-16-37-50-112.png
>
>
> Hi,
> I'm trying to understand why the values of min/avg/max latency are showing surprising results. The graph below shows the max latency value of a particular node for last 7 days. The value increases gradually over time and it only ever decreases when the node gets restarted as if the metric value gets reset.
> [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L226]
> !image-2021-08-27-16-10-28-783.png|width=984,height=204!
>  * 3 nodes
>  * Keberos enabled
>  * TGT ticket cashe enabled.
> I believes the values of min/avg/max latency should show more realistic variations. It's very unlikely that the max latency value is expected to always increase while the node is running.
> [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerStats.java#L142]
>  _public void updateLatency(Request request, long currentTime) {_
>  _long latency = currentTime - request.createTime;_
>  _if (latency < 0) {_
>  _return;_
>  _}_
>  _*{color:#FF0000}requestLatency.addDataPoint(latency);{color}*_
>  _if (request.getHdr() != null) {_
>  _// Only quorum request should have header_
>  _ServerMetrics.getMetrics().UPDATE_LATENCY.add(latency);_
>  _} else {_
>  _// All read request should goes here_
>  _ServerMetrics.getMetrics().READ_LATENCY.add(latency);_
>  _}_
> The method called let me think that the max latency metric gets set if the current values happens to be lower. __ 
> [https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/metric/AvgMinMaxCounter.java#L51]
>  _private void setMax(long value) {_
>  *{color:#FF0000}_long current;_{color}*
>  *{color:#FF0000}_while (value > (current = max.get()) && !max.compareAndSet(current, value)) {_{color}*
>  _// no op_
>  _}_
>  _}_
> I put below a graph of a particular from a totally different cluster for last 2 days. The node has not been restarted and all the data is from the same process. We can see a more realistic variations of the max latency metric as it would normally. 
> !image-2021-08-27-16-37-50-112.png|width=1084,height=222!
> Thanks for you time in advance,
> Math



--
This message was sent by Atlassian Jira
(v8.20.10#820010)