You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/06/09 19:22:00 UTC

[jira] [Commented] (HDFS-17042) Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode

    [ https://issues.apache.org/jira/browse/HDFS-17042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731075#comment-17731075 ] 

ASF GitHub Bot commented on HDFS-17042:
---------------------------------------

xinglin opened a new pull request, #5730:
URL: https://github.com/apache/hadoop/pull/5730

   ### Description of PR
   Add two new types of metrics to the existing NN RpcMetrics/RpcDetailedMetrics. These two metrics can then be used as part of SLA/SLO for the HDFS service.
   
   - RpcCallSuccesses: it measures the number of RPC requests where they are successfully processed by a NN (e.g., with a response with an RpcStatus RpcStatusProto.SUCCESS). Then, together with RpcQueueNumOps (which refers the total number of RPC requests), we can derive the RpcErrorRate for our NN, as (RpcQueueNumOps - RpcCallSuccesses) / RpcQueueNumOps. 
   
   - OverallRpcProcessingTime for each RPC method: this metric measures the overall RPC processing time for each RPC method at the NN. It covers the time from when a request arrives at the NN to when a response is sent back. We are already emitting processingTime for each RPC method today in RpcDetailedMetrics. We want to extend it to emit overallRpcProcessingTime for each RPC method, which includes enqueueTime, queueTime, processingTime, responseTime, and handlerTime.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest="TestRPC#testOverallRpcProcessingTimeMetric"
   [INFO] -------------------------------------------------------
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running org.apache.hadoop.ipc.TestRPC
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.014 s - in org.apache.hadoop.ipc.TestRPC
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
   
   mvn test -Dtest="TestRPC#testRpcCallSuccessesMetric"
   [INFO] -------------------------------------------------------
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running org.apache.hadoop.ipc.TestRPC
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.004 s - in org.apache.hadoop.ipc.TestRPC
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
   ```
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files?
   
   




> Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-17042
>                 URL: https://issues.apache.org/jira/browse/HDFS-17042
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.4.0, 3.3.9
>            Reporter: Xing Lin
>            Assignee: Xing Lin
>            Priority: Major
>
> We'd like to add two new types of metrics to the existing NN RpcMetrics/RpcDetailedMetrics. These two metrics can then be used as part of SLA/SLO for the HDFS service.
>  * {_}RpcCallSuccesses{_}: it measures the number of RPC requests where they are successfully processed by a NN (e.g., with a response with an RpcStatus {_}RpcStatusProto.SUCCESS){_}{_}.{_} Then, together with {_}RpcQueueNumOps ({_}which refers the total number of RPC requests{_}){_}, we can derive the RpcErrorRate for our NN, as (RpcQueueNumOps - RpcCallSuccesses) / RpcQueueNumOps. 
>  * OverallRpcProcessingTime for each RPC method: this metric measures the overall RPC processing time for each RPC method at the NN. It covers the time from when a request arrives at the NN to when a response is sent back. We are already emitting processingTime for each RPC method today in RpcDetailedMetrics. We want to extend it to emit overallRpcProcessingTime for each RPC method, which includes enqueueTime, queueTime, processingTime, responseTime, and handlerTime.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org