You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/06/06 06:01:01 UTC

[GitHub] [hadoop-ozone] runzhiwang opened a new pull request #1031: HDDS-3745. Improve OM and SCM performance with 50% by avoid call getServiceInfo in s3g

runzhiwang opened a new pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031


   
   
   ## What changes were proposed in this pull request?
   
   **What's the problem ?**
   I start a ozone cluster with 1000 datanodes and 10 s3gateway, and run two weeks with heavy workload, and perf om and scm.
   1. From om perf, getServiceList cost 63.75% cpu.
   ![image](https://user-images.githubusercontent.com/51938049/83937289-ed6b0600-a7fd-11ea-84a9-25f267abc759.png)
   
   2.From scm perf, queryNode come from om::getServiceList cost 33.20% cpu
   ![image](https://user-images.githubusercontent.com/51938049/83937299-fe1b7c00-a7fd-11ea-9fbf-768b269ef7a1.png)
   
   What's the reason ?
   Now s3g create a client for each request. when create each RpcClient, s3g will call `ozoneManagerClient.getServiceInfo()`, `getServiceInfo` will call `getServiceList`. Then om and scm are very busy with getServiceList.
   
   But s3g does not use the List<ServiceInfo> which got from getServiceList at all.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3745
   
   ## How was this patch tested?
   
   Existed tests.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] codecov-commenter commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-643863733


   # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1031?src=pr&el=h1) Report
   > Merging [#1031](https://codecov.io/gh/apache/hadoop-ozone/pull/1031?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hadoop-ozone/commit/23034fbda62d7ee5abe0f664b58248cba6e1468e&el=desc) will **decrease** coverage by `0.05%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/graphs/tree.svg?width=650&height=150&src=pr&token=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/1031?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1031      +/-   ##
   ============================================
   - Coverage     70.38%   70.33%   -0.06%     
   + Complexity     9234     9227       -7     
   ============================================
     Files           961      961              
     Lines         48130    48115      -15     
     Branches       4676     4674       -2     
   ============================================
   - Hits          33877    33842      -35     
   - Misses        12001    12020      +19     
   - Partials       2252     2253       +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hadoop-ozone/pull/1031?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [.../java/org/apache/hadoop/ozone/om/OzoneManager.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9Pem9uZU1hbmFnZXIuamF2YQ==) | `63.94% <ø> (+0.04%)` | `184.00 <0.00> (-1.00)` | :arrow_up: |
   | [.../apache/hadoop/hdds/scm/node/StaleNodeHandler.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvc2VydmVyLXNjbS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL2hkZHMvc2NtL25vZGUvU3RhbGVOb2RlSGFuZGxlci5qYXZh) | `88.88% <0.00%> (-11.12%)` | `4.00% <0.00%> (ø%)` | |
   | [...er/common/transport/server/GrpcXceiverService.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3RyYW5zcG9ydC9zZXJ2ZXIvR3JwY1hjZWl2ZXJTZXJ2aWNlLmphdmE=) | `70.00% <0.00%> (-10.00%)` | `3.00% <0.00%> (ø%)` | |
   | [...ntainerLocationProtocolServerSideTranslatorPB.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvc2VydmVyLXNjbS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL2hkZHMvc2NtL3Byb3RvY29sL1N0b3JhZ2VDb250YWluZXJMb2NhdGlvblByb3RvY29sU2VydmVyU2lkZVRyYW5zbGF0b3JQQi5qYXZh) | `38.31% <0.00%> (-5.15%)` | `17.00% <0.00%> (-2.00%)` | |
   | [...ntainerLocationProtocolClientSideTranslatorPB.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvZnJhbWV3b3JrL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvaGRkcy9zY20vcHJvdG9jb2xQQi9TdG9yYWdlQ29udGFpbmVyTG9jYXRpb25Qcm90b2NvbENsaWVudFNpZGVUcmFuc2xhdG9yUEIuamF2YQ==) | `41.17% <0.00%> (-3.93%)` | `20.00% <0.00%> (-2.00%)` | |
   | [...adoop/hdds/scm/server/SCMClientProtocolServer.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvc2VydmVyLXNjbS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL2hkZHMvc2NtL3NlcnZlci9TQ01DbGllbnRQcm90b2NvbFNlcnZlci5qYXZh) | `47.65% <0.00%> (-3.52%)` | `23.00% <0.00%> (-4.00%)` | |
   | [...g/apache/hadoop/hdds/protocol/DatanodeDetails.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvaGRkcy9wcm90b2NvbC9EYXRhbm9kZURldGFpbHMuamF2YQ==) | `88.23% <0.00%> (-1.69%)` | `29.00% <0.00%> (-1.00%)` | |
   | [.../common/states/endpoint/HeartbeatEndpointTask.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3N0YXRlcy9lbmRwb2ludC9IZWFydGJlYXRFbmRwb2ludFRhc2suamF2YQ==) | `69.81% <0.00%> (-1.26%)` | `25.00% <0.00%> (-1.00%)` | |
   | [...p/ozone/container/keyvalue/helpers/ChunkUtils.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIva2V5dmFsdWUvaGVscGVycy9DaHVua1V0aWxzLmphdmE=) | `85.45% <0.00%> (-0.91%)` | `30.00% <0.00%> (-1.00%)` | |
   | [...iner/common/statemachine/DatanodeStateMachine.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3N0YXRlbWFjaGluZS9EYXRhbm9kZVN0YXRlTWFjaGluZS5qYXZh) | `82.87% <0.00%> (-0.56%)` | `29.00% <0.00%> (-1.00%)` | |
   | ... and [11 more](https://codecov.io/gh/apache/hadoop-ozone/pull/1031/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1031?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1031?src=pr&el=footer). Last update [23034fb...13fb742](https://codecov.io/gh/apache/hadoop-ozone/pull/1031?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] nandakumar131 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
nandakumar131 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642755354


   **Some background**
   `gerServiceInfo` call was introduced as part of `Service Discovery API` (HDFS-12868).
   The idea behind `Service Discovery API` was to not worry about setting multiple properties at client to connect to Ozone.
   Point the client to any one of the OzoneManager (even in case of HA), the client should be able to get all the required configuration and be able to connect to Ozone.
   This will make the client configuration simple.
   
   When `gerServiceInfo` was introduced, Ozone client just needed OM's IP and port to connect, nothing else. Over time we made lot of code change and never revisited `Service Discovery API`.
   After OM HA, it is no longer true that we can discover other services and required configurations just with single OM's IP/Port. We need `ozone-site.xml` at client and set of properties that will allow us to interact with OM HA.
   
   Now I'm not even sure where and why `gerServiceInfo` is getting used.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-641665483


   @elek Could you help review this patch ? Thank you very much.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on a change in pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on a change in pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#discussion_r436444819



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
##########
@@ -170,10 +169,10 @@ public RpcClient(ConfigurationSource conf, String omServiceId)
         OzoneManagerProtocol.class, conf
     );
     dtService = omTransport.getDelegationTokenService();
-    ServiceInfoEx serviceInfoEx = ozoneManagerClient.getServiceInfo();
+    String certificate = ozoneManagerClient.getCaCertificate();
     String caCertPem = null;
     if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
-      caCertPem = serviceInfoEx.getCaCertificate();
+      caCertPem = certificate;

Review comment:
       CI does not passed if we do this, maybe CI want ozoneManagerClient.getCaCertificate throw exception even if security is not enabled. I will find the reason.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642816946


   > @bharatviswa504 I understand the reason behind it.
   > 
   > We should revisit `Service Discovery API` and make a decision on whether we can make it work or should we remove it completely.
   
   >After OM HA, it is no longer true that we can discover other services and required >configurations just with single OM's IP/Port. We need ozone-site.xml at client and set of .properties that will allow us to interact with OM HA.
   
   Agreed. I just want to give you a context for the above statement mentioned.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-647458478


   So what's next here? I propose to merge this as is (clean simple patch, just remove the datanode part). 
   
   If `getServiceInfo()` is not required anymore: we can remove it fully in a separated patch. (Not just the usage, but we can deprecate and/or remove the server side parts. 
   
   @bharatviswa504, @xiaoyuyao  Are you ok with this approach?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-644922194


   bq. I have not understood, what is the problem mentioned here, as from code we can see the value from getServiceInfo is only used when security is enabled.
   
   getServiceInfo is the service discovery API for OM, which is not for security only. If we go with the other route, i.e., a dedicated API for getCaCertificate(), I'm OK with including check for security enabled.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-643893676


   Thank You for the update.
   Can we also make the change in RpcClient.java, calling getServiceInfo only when security is enabled?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642985694


   @nandakumar131 @bharatviswa504  Hi, the problem of getServiceInfo is that it contains all the datanodes' information which cost too much if the cluster is big. If it only contain OM address, there is no need to replace it with `getCaCertificate`. 
   ![image](https://user-images.githubusercontent.com/51938049/84449860-2cc3a780-ac81-11ea-93c3-90c642fe51fb.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 merged pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642999728


   > @elek @nandakumar131 @bharatviswa504 Hi, Thanks for review. The problem of getServiceInfo is that it contains all the datanodes' information which cost too much if the cluster is big. If it only contain OM address, there is no need to replace it with `getCaCertificate`.
   > ![image](https://user-images.githubusercontent.com/51938049/84449860-2cc3a780-ac81-11ea-93c3-90c642fe51fb.png)
   
   Thank You @runzhiwang for the explanation of the issue. I like @xiaoyuyao proposal, if DN information is no more required, we can clean that up, and use getServiceInfo instead of a new RPC call.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 edited a comment on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642797538


   > **Some background**
   > `gerServiceInfo` call was introduced as part of `Service Discovery API` (HDFS-12868).
   > The idea behind `Service Discovery API` was to not worry about setting multiple properties at client to connect to Ozone.
   > Point the client to any one of the OzoneManager (even in case of HA), the client should be able to get all the required configuration and be able to connect to Ozone.
   > This will make the client configuration simple.
   > 
   > When `gerServiceInfo` was introduced, Ozone client just needed OM's IP and port to connect, nothing else. Over time we made lot of code change and never revisited `Service Discovery API`.
   > After OM HA, it is no longer true that we can discover other services and required configurations just with single OM's IP/Port. We need `ozone-site.xml` at client and set of properties that will allow us to interact with OM HA.
   > 
   > Now I'm not even sure where and why `gerServiceInfo` is getting used.
   
   For OM HA, the reason for not using the getServiceInfo was if client knows one of the OzoneManager addresses, and if that is down, client will not be able to talk to the ozone cluster.
   Even right now getServiceInfo, will now return quorum of OM address with each role, but this information cannot be used for the above reason, so the client still needs a quorum of OM address.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-644133382


   >  Can we also make the change in RpcClient.java, calling getServiceInfo only when security is enabled?
   
   I have some fear if the secre/unsecure cluster would have such difference. For example, I should run all the performance tests on secure cluster. We should be careful and execute all the tests on both secure / unsecure cluster. Seems to be safer to keep it as is. But I can be convinced.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] nandakumar131 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
nandakumar131 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642811847


   @bharatviswa504 I understand the reason behind it.
   
   We should revisit `Service Discovery API` and make a decision on whether we can make it work or should we remove it completely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642985694


   @nandakumar131 @bharatviswa504  Hi, the problem of getServiceInfo is that it include all the datanodes' information.
   ![image](https://user-images.githubusercontent.com/51938049/84449860-2cc3a780-ac81-11ea-93c3-90c642fe51fb.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642985694






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642815995


   From the code point of I see the usage of getServiceInfo used to get CaCert and used by XceiverClientRatis and Grpc.
   
   ```
       ServiceInfoEx serviceInfoEx = ozoneManagerClient.getServiceInfo();
       String caCertPem = null;
       if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
         caCertPem = serviceInfoEx.getCaCertificate();
       }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642997309


   @xiaoyuyao Thanks for explanation, I agree and I will clean that up.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-648492855


   Fine with me. In this Jira let's focus on removing datanode part.
   
   +1 LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 edited a comment on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-645524303


   But right now getServiceInfo is used only to getCaCertificate, other than that in current code there is no usage of it. As now the client does not need an SCM address, for OM, we get a quorum of address from config, and also we don't need DN address which this patch has eliminated.
   
   If we can skip getServiceInfo call in a non-secure cluster where we can save one RPC Call. I am fine with reusing this API or with totally a new API.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] GlenGeng commented on a change in pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#discussion_r436443945



##########
File path: hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
##########
@@ -170,10 +169,10 @@ public RpcClient(ConfigurationSource conf, String omServiceId)
         OzoneManagerProtocol.class, conf
     );
     dtService = omTransport.getDelegationTokenService();
-    ServiceInfoEx serviceInfoEx = ozoneManagerClient.getServiceInfo();
+    String certificate = ozoneManagerClient.getCaCertificate();
     String caCertPem = null;
     if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
-      caCertPem = serviceInfoEx.getCaCertificate();
+      caCertPem = certificate;

Review comment:
       how about 
   ```
   caCertPem = ozoneManagerClient.getCaCertificate();
   ```
   
   only invoke when security is enabled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 edited a comment on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-648492855


   Fine with me. In this Jira let's focus on removing datanode part.
   
   I will open a new Jira for removing the usage of getServiceInfo.
   
   +1 LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-643868208


   @xiaoyuyao @bharatviswa504 Thanks for review. I have updated the patch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 edited a comment on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642815995


   From the code point of I see the usage of getServiceInfo used to get CaCert and used by XceiverClientRatis and Grpc.
   
   ```
       ServiceInfoEx serviceInfoEx = ozoneManagerClient.getServiceInfo();
       String caCertPem = null;
       if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
         caCertPem = serviceInfoEx.getCaCertificate();
       }
   ```
   Looking into the PR code now it has eliminated using this getServiceInfo, and added a new API for getcaCert when security is enabled.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-644906452


   The getServiceInfo could be used for non-secure configuration from the server in future. I agree with @elek to leave it as-is.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642797538


   > **Some background**
   > `gerServiceInfo` call was introduced as part of `Service Discovery API` (HDFS-12868).
   > The idea behind `Service Discovery API` was to not worry about setting multiple properties at client to connect to Ozone.
   > Point the client to any one of the OzoneManager (even in case of HA), the client should be able to get all the required configuration and be able to connect to Ozone.
   > This will make the client configuration simple.
   > 
   > When `gerServiceInfo` was introduced, Ozone client just needed OM's IP and port to connect, nothing else. Over time we made lot of code change and never revisited `Service Discovery API`.
   > After OM HA, it is no longer true that we can discover other services and required configurations just with single OM's IP/Port. We need `ozone-site.xml` at client and set of properties that will allow us to interact with OM HA.
   > 
   > Now I'm not even sure where and why `gerServiceInfo` is getting used.
   
   For OM HA, the reason for not using the getServiceInfo was if client knows one of the OzoneManager addresses, and if that is down, client will not be able to talk to the ozone cluster.
   Even right now getServiceInfo, will now return OM HA address with role, but this information cannot be used for the above reason, so the client still need a quorum of OM address.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-644914115


   > > Can we also make the change in RpcClient.java, calling getServiceInfo only when security is enabled?
   > 
   > I have some fear if the secre/unsecure cluster would have such difference. For example, I should run all the performance tests on secure cluster. We should be careful and execute all the tests on both secure / unsecure cluster. Seems to be safer to keep it as is. But I can be convinced.
   
   From code, I see that we use getServiceInfo value only when security is enabled. So, from S3G perspective, for each request, we can save one Rpc Call.
   
   
   ```
       ServiceInfoEx serviceInfoEx = ozoneManagerClient.getServiceInfo();
       String caCertPem = null;
       if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
         caCertPem = serviceInfoEx.getCaCertificate();
       }
   ```
   
   I have not understood, what is the problem mentioned here, as from code we can see the value from getServiceInfo is only used when security is enabled.
   
   If someone has mistakenly used some of the information from getServiceInfo mistakenly with out initializing it, it will be caught immediately as it will throw NPE. And also we are not initializing serviceinfo, to any class parameter, it is locally used in the RpcClient constructor only.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao edited a comment on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
xiaoyuyao edited a comment on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-644922194


   > I have not understood, what is the problem mentioned here, as from code we can see the value from getServiceInfo is only used when security is enabled.
   
   getServiceInfo is the service discovery API for OM, which is not for security only. If we go with the other route, i.e., a dedicated API for getCaCertificate(), I'm OK with including check for security enabled.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-645524303


   But right now getServiceInfo is used only to getCaCertificate, other than that in current code there is no usage of it. As now the client does not need an SCM address, for OM, we get a quorum of address from config.
   
   If we can skip getServiceInfo call in a non-secure cluster where we can save one RPC Call. I am fine with reusing this API or with totally a new API.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid collect datanode information to s3g

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-648494388


   Thank You @runzhiwang for the contribution and @elek and @xiaoyuyao for the review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642010606


   Thanks the patch @runzhiwang 
   
   It looks correct me, but it's also a question about the long-term usage of `getServiceInfo`. Originally it was introduced (as far as I remember) to get the address of the SCM. But over the time the client is improved to avoid all the direct calls to the SCM.
   
   I agree, that long-term we should use proxy users for S3 and pooling the connections.
   
   Short-term this patch looks good to me, but why don't we use the `getServiceInfo` in case of secure clusters? Do we need to replace it with something simple.
   
   I would be interested about the opinion of @nandakumar131. As far as I remember he worked on the original implementation.
   
   Personally I think a generic `getServiceClient` can be useful. For example active `storage-class`-es can be downloaded from the server at the beginning of the connection. But that's a long term plan and this patch can help short term.
   
   Let's wait for more opinions. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on pull request #1031: HDDS-3745. Improve OM and SCM performance with 64% by avoid call getServiceInfo in s3g

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on pull request #1031:
URL: https://github.com/apache/hadoop-ozone/pull/1031#issuecomment-642991229


   @runzhiwang the reason the serviceinfo request retrieve all the DNs is for the DN REST service address. After the Ozone REST support was removed a while back, those leftover are not cleaned up completely. Maybe a time to clean that up to make it lighter without introducing a new RPC?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org