You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/10/24 13:13:22 UTC

[GitHub] [ozone] xBis7 opened a new pull request, #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

xBis7 opened a new pull request, #3878:
URL: https://github.com/apache/ozone/pull/3878

   ## What changes were proposed in this pull request?
   
   On the Prometheus endpoint for the OM, in the DecayRpcScheduler summary for users, the username is exposed in the metric name. It makes almost impossible to monitor these values as every time a new user shows up we need to register a new metrics name. 
   
   The metric name from `org_apache_hadoop_ipc_decay_rpc_scheduler_call_volume` becomes `org_apache_hadoop_ipc_decay_rpc_scheduler_caller_hadoop_volume` for a user with `hadoop` username.
   
   The proposed solution is to remove the username from the metric and add it in a username tag. 
   
   This metric comes from `hadoop-common-3.3.4.jar/DecayRpcScheduler` and more specifically
   
   ```
   Metrics2Util.NameValuePair entry = (Metrics2Util.NameValuePair)topNCallers.poll();
   String topCaller = "Caller(" + entry.getName() + ")";
   String topCallerVolume = topCaller + ".Volume";
   String topCallerPriority = topCaller + ".Priority";
   rb.addCounter(Interns.info(topCallerVolume, topCallerVolume), entry.getValue());
   ``` 
   The name is in the format `Caller(username).MetricType` eg. `Caller(hadoop).Volume`. The cleanest way to deal with this seems to filter the metric in `PrometheusMetricsSink`.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7394
   
   ## How was this patch tested?
   
   This patch was tested manually, with a docker cluster and the OM `/prom` endpoint.
   
   To test it:
   
   in `compose/ozone` add in docker-config
   ```
   CORE-SITE.XML_ipc.9862.callqueue.impl=org.apache.hadoop.ipc.FairCallQueue
   CORE-SITE.XML_ipc.9862.scheduler.impl=org.apache.hadoop.ipc.DecayRpcScheduler
   CORE-SITE.XML_ipc.9862.scheduler.priority.levels=2
   CORE-SITE.XML_ipc.9862.backoff.enable=true
   CORE-SITE.XML_ipc.9862.faircallqueue.multiplexer.weights=99,1
   CORE-SITE.XML_ipc.9862.decay-scheduler.thresholds=90
   OZONE-SITE.XML_ozone.om.address=0.0.0.0:9862
   ```
   then
   
   ```
   $ export COMPOSE_FILE=docker-compose.yaml:monitoring.yaml
   $ docker-compose up --scale datanode=3 -d
   $ docker exec -it ozone_s3g_1 bash
   bash-4.2$ export AWS_ACCESS_KEY=test AWS_SECRET_KEY=pass
   bash-4.2$ ozone freon s3bg -t 1 -n 10
   ```
   on your browser go to `http://localhost:9874/prom` and you should see 
   
   ```
   # TYPE org_apache_hadoop_ipc_decay_rpc_scheduler_priority counter
   org_apache_hadoop_ipc_decay_rpc_scheduler_priority{context="ipc.9862",hostname="e32f2e3bddb9",username="hadoop"} 1
   ...
   ...
   ...
   # TYPE org_apache_hadoop_ipc_decay_rpc_scheduler_volume counter
   org_apache_hadoop_ipc_decay_rpc_scheduler_volume{context="ipc.9862",hostname="e32f2e3bddb9",username="hadoop"} 21
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] duongkame commented on pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
duongkame commented on PR #3878:
URL: https://github.com/apache/ozone/pull/3878#issuecomment-1302813143

   Thanks @xBis7 for the patch.
   
   > The name is in the format Caller(username).MetricType eg. Caller(hadoop).Volume. The username might exist in the metric name for a purpose. We don't want to change the way the metric works but the way it's presented. The cleanest way to deal with this seems to filter the metric in PrometheusMetricsSink.
   
   I still think this should rather be a change in `hadoop-common`. To keep it backward compatible, we may just introduce a new monitorable metric without changing the existing one. Capturing and modifying metrics in Ozone `PrometheusMetricsSink` looks pretty hacky, but if we have to do so, we'd rather copy the metrics to a new one without changing the original metric.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
neils-dev commented on PR #3878:
URL: https://github.com/apache/ozone/pull/3878#issuecomment-1304018055

   > Is this for ofs use cases or s3 based access?
   This applies to both @kerneltime.  This prometheus metric, `org_apache_hadoop_ipc_decay_rpc_scheduler_caller_*`  includes the user in the metric name on all rpc metric events.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] xBis7 commented on a diff in pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
xBis7 commented on code in PR #3878:
URL: https://github.com/apache/ozone/pull/3878#discussion_r1029202440


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/DecayRpcSchedulerUtil.java:
##########
@@ -0,0 +1,116 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.utils;
+
+import com.google.common.base.Strings;
+import org.apache.hadoop.metrics2.MetricsInfo;
+import org.apache.hadoop.metrics2.MetricsRecord;
+import org.apache.hadoop.metrics2.MetricsTag;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Helper functions for DecayRpcScheduler
+ * metrics for Prometheus.
+ */
+public final class DecayRpcSchedulerUtil {
+
+  private DecayRpcSchedulerUtil() {
+  }
+
+  private static final MetricsInfo USERNAME_INFO = new MetricsInfo() {
+    @Override
+    public String name() {
+      return "username";
+    }
+
+    @Override
+    public String description() {
+      return "caller username";
+    }
+  };
+
+  /**
+   * For Decay_Rpc_Scheduler, the metric name is in format
+   * "Caller(<callers_username>).Volume"
+   * or
+   * "Caller(<callers_username>).Priority"
+   * Split it and return the metric.
+   * @param recordName
+   * @param metricName "Caller(xyz).Volume" or "Caller(xyz).Priority"
+   * @return "Volume" or "Priority"
+   */
+  public static String splitMetricNameIfNeeded(String recordName,

Review Comment:
   @kerneltime I added the unit tests you requested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime merged pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
kerneltime merged PR #3878:
URL: https://github.com/apache/ozone/pull/3878


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
kerneltime commented on PR #3878:
URL: https://github.com/apache/ozone/pull/3878#issuecomment-1289273881

   CC @duongkame 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
neils-dev commented on PR #3878:
URL: https://github.com/apache/ozone/pull/3878#issuecomment-1333033169

   Thanks @xBis7 for this important patch.  Thanks @duongkame and @kerneltime for reviewing this PR and for your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
kerneltime commented on PR #3878:
URL: https://github.com/apache/ozone/pull/3878#issuecomment-1303003858

   @xBis7 I understand the need, but I think there was intent in the original metric, and this comes from being able to diagnose which application/user is seeing what metric. 
   From: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FairCallQueue.html
   
   ```
   The implementation of RpcScheduler used with FairCallQueue by default is DecayRpcScheduler, which maintains a count of requests received for each user.
   ```
   
   Is this for `ofs` use cases or `s3` based access?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3878:
URL: https://github.com/apache/ozone/pull/3878#discussion_r1028773602


##########
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/DecayRpcSchedulerUtil.java:
##########
@@ -0,0 +1,116 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.hdds.utils;
+
+import com.google.common.base.Strings;
+import org.apache.hadoop.metrics2.MetricsInfo;
+import org.apache.hadoop.metrics2.MetricsRecord;
+import org.apache.hadoop.metrics2.MetricsTag;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Helper functions for DecayRpcScheduler
+ * metrics for Prometheus.
+ */
+public final class DecayRpcSchedulerUtil {
+
+  private DecayRpcSchedulerUtil() {
+  }
+
+  private static final MetricsInfo USERNAME_INFO = new MetricsInfo() {
+    @Override
+    public String name() {
+      return "username";
+    }
+
+    @Override
+    public String description() {
+      return "caller username";
+    }
+  };
+
+  /**
+   * For Decay_Rpc_Scheduler, the metric name is in format
+   * "Caller(<callers_username>).Volume"
+   * or
+   * "Caller(<callers_username>).Priority"
+   * Split it and return the metric.
+   * @param recordName
+   * @param metricName "Caller(xyz).Volume" or "Caller(xyz).Priority"
+   * @return "Volume" or "Priority"
+   */
+  public static String splitMetricNameIfNeeded(String recordName,

Review Comment:
   Please add unit tests for these methods to make sure the string processing is what we expect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] xBis7 commented on pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
xBis7 commented on PR #3878:
URL: https://github.com/apache/ozone/pull/3878#issuecomment-1301009510

   @kerneltime any updates on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] xBis7 commented on pull request #3878: HDDS-7394. OM RPC FairCallQueue decay decision metrics list caller username in the metric

Posted by GitBox <gi...@apache.org>.
xBis7 commented on PR #3878:
URL: https://github.com/apache/ozone/pull/3878#issuecomment-1303119481

   @kerneltime @duongkame Thanks for taking a look at the patch.
   
    
   
   > I think there was intent in the original metric, and this comes from being able to diagnose which application/user is seeing what metric.
   
   I agree, that's why there are no functional changes. The metric that gets passed back and forth is exactly the same and we are only filtering how the endpoint presents it, like [here](https://github.com/xBis7/ozone/blob/master/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/PrometheusMetricsSink.java#L117-L120) with the RocksDb metrics.
   
   > Is this for ofs use cases or s3 based access?
   
   It was used with `S3` but it could be for both use cases.
   
   > I still think this should rather be a change in hadoop-common. 
   
   That was my first thought but it seems too complex for what we need. We don't want to change the metric or reduce the times it gets registered. We need the name to be consistent, so that it's easier to track with a wildcard search or something like that, but still want to be able to see the user for every registry. 
   If you think that's the best, I'm happy to move forward with it. We don't have to make the change to hadoop-commons but instead have a class in Ozone, overriding the one from the `jar`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org