You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/02/27 07:22:24 UTC

[GitHub] [hadoop-ozone] runzhiwang opened a new pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

runzhiwang opened a new pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611
 
 
   
   ## What changes were proposed in this pull request?
   
   1.  s3g creates client for each request at https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/OzoneClientProducer.java#L66, but does not release the connection resource when destroy the client. So memory leak happens. The details of memory leak are as follows.
   
   2.  When Physical memory of s3g is 10GB, jmap shows memory leak happens on jvm heap.
   ![image](https://user-images.githubusercontent.com/51938049/75419844-b15c1580-5971-11ea-93f4-10ec5af65521.png)
   
   3. Restart s3g, dump the heap when physical memory cost 3GB, the image shows there are 262144 `InternalSubchannel` in `subchannels`, each `InternalSubchannel` in grpc-java represents a connection, so many connections means the connection resource was not released when destroy client.
   ![image](https://user-images.githubusercontent.com/51938049/75420056-1fa0d800-5972-11ea-8bb5-54c9f2f6546c.png)
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3041
   
   The jira was created by JieWang, JieWang is also my username. I forget the password of username runzhiwang in jira and has not find it back, sorry for that.
   
   ## How was this patch tested?
   
   Make a stress test on s3g: send total 4 million requests by 40 threads to s3g. The physical memory of s3g is stable at 1.5G.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. And I also find the 1-3 seconds delay and 2381% cpu all are related to the code at https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L271, because if I comment this code, the delay and high cpu cost will disappear.  Even though I close the all the clients in a separated thread, the cpu cost is also 2400% approximately. I will continue to find out the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593254961
 
 
   @elek Hi, I have find out the reason of 1-3 seconds for close, because the cpu cost is full which is related to the bug of JDK8 https://bugs.openjdk.java.net/browse/JDK-8129861. You can find the details in this PR description. Now it cost only 1 ms to close client.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang removed a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang removed a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. When use jstack, I find a lot of threads are in the state: RUNNABLE at ScheduledThreadPoolExecutor.java:809, and this is a bug of java8: https://bugs.openjdk.java.net/browse/JDK-8129861. Then I compile and run ozone with java9, the cpu is ok now. I will continue to find the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)
   ![image](https://user-images.githubusercontent.com/51938049/75640844-4a8c7400-5c71-11ea-9ced-1c35404dfba5.png)
   ![image](https://user-images.githubusercontent.com/51938049/75640827-41030c00-5c71-11ea-8f43-53552cbb881f.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592246411
 
 
   @bharatviswa504 Yes, you are right. I'm also trying use the same `RpcClient` rather than create new `RpcClient` for each request.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-594114255
 
 
   @bharatviswa504 @runzhiwang  I have created the snapshot 0.6.0-a320ae0-SNAPSHOT. You can use this version in ozone.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-591920428
 
 
   client.close() cost 1-3 seconds, which leads to low performance of s3g. I will find another fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592246411
 
 
   @bharatviswa504 Yes, I'm also trying use the same `RpcClient` rather than create new `RpcClient` for each request.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593657846
 
 
   @runzhiwang why the RATIS JIRA got closed? My understanding is that the fix proposed here has a dependency on that one. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. When use jstack, I find a lot of threads are in the state:waiting to lock. And I also find the 1-3 seconds delay and 2381% cpu all are related to the code at https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L271, because if I comment this code, the delay and high cpu cost will disappear.  Even though I close all the clients in a separated thread, the cpu cost is also 2400% approximately. I will continue to find the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. And I also find the 1-3 seconds delay and 2381% cpu all are related to the code at https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L271, because if I comment this code, the delay and high cpu cost will disappear.  Even though I close all the clients in a separated thread, the cpu cost is also 2400% approximately. I will continue to find out the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi merged pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
ChenSammi merged pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang closed pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang closed pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang closed pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang closed pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] ChenSammi commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
ChenSammi commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-594990085
 
 
   Merged. Thanks @runzhiwang for the contribution,  @bharatviswa504, @elek , @xiaoyuyao , @lokeshj1703 and @timmylicheng  for code review and help. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang opened a new pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang opened a new pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611
 
 
   
   ## What changes were proposed in this pull request?
   
   1.  s3g creates client for each request at https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/OzoneClientProducer.java#L66, but does not release the connection resource when destroy the client. So memory leak happens. The details of memory leak are as follows.
   
   2.  When Physical memory of s3g is 10GB, jmap shows memory leak happens on jvm heap.
   ![image](https://user-images.githubusercontent.com/51938049/75419844-b15c1580-5971-11ea-93f4-10ec5af65521.png)
   
   3. Restart s3g, dump the heap when physical memory cost 3GB, the image shows there are 262144 `InternalSubchannel` in `subchannels`, each `InternalSubchannel` in grpc-java represents a connection, so many connections means the connection resource was not released when destroy client.
   ![image](https://user-images.githubusercontent.com/51938049/75420056-1fa0d800-5972-11ea-8bb5-54c9f2f6546c.png)
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3041
   
   The jira was created by JieWang, JieWang is also my username. I forget the password of username runzhiwang in jira and has not find it back, sorry for that.
   
   ## How was this patch tested?
   
   Make a stress test on s3g: send total 4 million requests by 40 threads to s3g. The physical memory of s3g is stable at 1.5G.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592115908
 
 
   Just a question, why cannot we use the same RpcClient instance throughout, and use the same one?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-591838810
 
 
   @elek There are some problem, please wait for me.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593677500
 
 
   We can update with SnapShot version, @mukul1987 / @lokeshj1703  can help in releasing ratis snapshot version, and we can use that to update ratis.version in master.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. And I also find the 1-3 seconds delay and 2381% cpu all are related to the code at https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L271, because if I comment this code, the delay and high cpu cost will disappear.  Even though I close all the clients in a separated thread, the cpu cost is also 2400% approximately. I will continue to find the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
elek commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592491608
 
 
   /pending "There are some problem, please wait for me."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. When use jstack, I find a lot of threads are in the state:waiting to lock. I will continue to find the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang opened a new pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang opened a new pull request #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611
 
 
   ## What changes were proposed in this pull request?
   
   1.  s3g creates client for each request at https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/OzoneClientProducer.java#L66, but does not release the connection resource when destroy the client. So memory leak happens. The details of memory leak are as follows.
   
   2.  When Physical memory of s3g is 10GB, jmap shows memory leak happens on jvm heap.
   ![image](https://user-images.githubusercontent.com/51938049/75419844-b15c1580-5971-11ea-93f4-10ec5af65521.png)
   
   3. Restart s3g, dump the heap when physical memory cost 3GB, the image shows there are 262144 `InternalSubchannel` in `subchannels`, each `InternalSubchannel` in grpc-java represents a connection, so many connections means the connection resource was not released when destroy client.
   ![image](https://user-images.githubusercontent.com/51938049/75420056-1fa0d800-5972-11ea-8bb5-54c9f2f6546c.png)
   
   4.  However if with this PR, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to 2381% in the machine with 24 cores.
   ![image](https://user-images.githubusercontent.com/51938049/75649455-183d3f80-5c8e-11ea-91b6-953113880d5a.png)
   
   5. When use jstack, I find a lot of threads which cost 100% cpu  are in the state: RUNNABLE at ScheduledThreadPoolExecutor.java:809, and this is a bug of java8: https://bugs.openjdk.java.net/browse/JDK-8129861. Then I compile and run ozone with java9, the cpu cost is normal, so I'm almost sure it's the bug.
   ![image](https://user-images.githubusercontent.com/51938049/75649507-4458c080-5c8e-11ea-9770-02b06b17a122.png)
   
   6.  Then I find the related code with the bug JDK-8129861 is in ratis: https://github.com/apache/incubator-ratis/blob/master/ratis-common/src/main/java/org/apache/ratis/util/TimeoutScheduler.java#L88. The code init a thread pool with size zero, then` client.close` in ozone trigger the bug. The trigger stack is as the image shows.
   ![image](https://user-images.githubusercontent.com/51938049/75652511-373fcf80-5c96-11ea-9572-36339e59b1ea.png)
   
   7.  If this PR work together with the ratis PR: https://github.com/apache/incubator-ratis/pull/56,  the memory and cpu of s3g all will be normal. 
   
   8. So if  want to fix memory leak of s3g, there are 3 steps: 1. Merge the ratis PR:  https://github.com/apache/incubator-ratis/pull/56 and release a new ratis version. Because there is a duplicated jira RATIS-814 with a same fix was merged at Mar 2, 2020, so I close my ratis jira; 2. Upgrade the ratis version in ozone 3. merge this PR.
   
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3041
   
   The jira was created by JieWang, JieWang is also my username. I forget the password of username runzhiwang in jira and has not find it back, sorry for that.
   
   ## How was this patch tested?
   1. Compile ratis-0.5.0-rc0  with PR: https://github.com/apache/incubator-ratis/pull/56.
   
   2. Compile ozone with this PR.
   
   3. Replace ozone-0.5.0-SNAPSHOT/share/ozone/lib/ratis-common-0.5.0.jar with ratis-common/target/ratis-common-0.5.0.jar.
   
   4. Make a stress test on s3g: send total 4 million requests by 40 threads to s3g. The physical memory of s3g is stable at 1.5 G.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
elek commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592491480
 
 
   Fix me if I am wrong, but one client uses one RPC connection to the OM endpoint. As far as I know the user identity is identified during the initialization of the Hadoop RPC session. In this case one client -> one connection -> one user identity...
   
   To reuse the same connection by multiple HTTP client we need to use some kind of impersonalization and strong authentication on the S3 side. (S3g would be connected as an admin user but send the authentication information which is validated on the S3 side).
   
   Currently this is not the case (and this the reason to have separated clients): s3g doesn't do any authentication just forwards the requests together with the authentication information.
   
   1-3 seconds for close seems to be very high, there can be some bug in the code. As a very quick fix I would check if It can be reduced and commit this code. (Or you can do the actual code from a separated thread...)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593254961
 
 
   @elek Hi, I have find out the reason of 1-3 seconds for close, because the cpu cost is full which is related to the bug of JDK8 https://bugs.openjdk.java.net/browse/JDK-8129861. You can find the details in this PR description.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593676516
 
 
   @bharatviswa504 RATIS-814 was merged yesterday, and has not release RATIS-0.6.0. Can I change the RATIS version in ozone from 0.5.0 to 0.6.0 directly now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593676516
 
 
   @bharatviswa504 RATIS-814 was merged yesterday, and has not release RATIS-0.6.0. Can I update the RATIS version in ozone from 0.5.0 to 0.6.0 directly now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang edited a comment on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. When use jstack, I find a lot of threads are in the state: RUNNABLE at ScheduledThreadPoolExecutor.java:809, and this is a bug of java8: https://bugs.openjdk.java.net/browse/JDK-8129861. Then I compile and run ozone with java9, the cpu is ok now. I will continue to find the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)
   ![image](https://user-images.githubusercontent.com/51938049/75640844-4a8c7400-5c71-11ea-9ced-1c35404dfba5.png)
   ![image](https://user-images.githubusercontent.com/51938049/75640827-41030c00-5c71-11ea-8f43-53552cbb881f.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592508704
 
 
   @elek Hi, thank you for your reply.  Because you would check the code, so I want to share you some information I have found. If with this patch, s3g will occupy all the cpu, as the image shows the cpu cost of s3g increases to  2381% in the machine with 24 cores, it's horrible. And I also find the 1-3 seconds delay and 2381% cpu all are related to the code at https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientManager.java#L271.  Even though I close the all the clients in a separated thread, the cpu cost is also 2400% approximately. I will continue to find out the root cause.
   ![image](https://user-images.githubusercontent.com/51938049/75551773-dfc01a80-5a6f-11ea-9be5-29f8c9acc64b.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593676516
 
 
   @bharatviswa504 RATIS-814 was merged yesterday, and has not release Ratis-0.6.0. Can I change the RATIS version in ozone from 0.5.0 to 0.6.0 directly now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-591916627
 
 
   +1 Good catch and clean fix!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on issue #611: [WIP]HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-592254537
 
 
   > @bharatviswa504 Yes, I'm also trying use the same `RpcClient` rather than create new `RpcClient` for each request.
   
   But later when thinking more about it, it might be a problem in the OM HA case. That means for the entire S3Gateway we will have ozone.client.failover.max.attempts(default 15) retries if we use single RpcClient. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-591833028
 
 
   @elek Could you help review this patch ? Thank you very much.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593671036
 
 
   @xiaoyuyao Yes, you are right. There is a duplicated jira RATIS-814 with a same fix was merged at Mar 2, 2020, so I close my ratis jira.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593255369
 
 
   @ChenSammi @xiaoyuyao @mukul1987 Could you help review this patch too ? Thank you very much.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on issue #611: HDDS-3041. Fix the memory leak of s3g by releasing the connection resource
URL: https://github.com/apache/hadoop-ozone/pull/611#issuecomment-593677500
 
 
   We can update with SnapShot version, @mukul1987 / @lokeshj1703  can help in releasing ratis snapshot version, and we can use that to update ratis.version in master.
   
   @mukul1987 / @lokeshj1703 pls help in pushing ratis version to maven, which we can use in ozone.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org