You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/07/26 10:00:22 UTC

[GitHub] [incubator-uniffle] xianjingfeng opened a new issue, #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

xianjingfeng opened a new issue, #76:
URL: https://github.com/apache/incubator-uniffle/issues/76

   We found shuffle server which under high load is easy encounter `java.lang.OutOfMemoryError: Java heap space` even we allocate more jvm heap memory and less `rss.server.buffer.capacity `
   
   The steps for the exception above:
   1. When  shuffle server under high load, `requireBufferId` is easy to expire, and Suffle server release `usedMemory`
   2. Client `sendShuffleData` using a expired  `requireBufferId`, 
   3. Suffle server recive shuffle data and store _in rpc queue_(this part of memory usage was not be added to `usedMemory`)
   4. Other clients `requireBuffer`  because `usedMemory` is enough


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #76:
URL: https://github.com/apache/incubator-uniffle/issues/76#issuecomment-1222299981

   closed by #157 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi closed issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
jerqi closed issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired
URL: https://github.com/apache/incubator-uniffle/issues/76


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on issue #76:
URL: https://github.com/apache/incubator-uniffle/issues/76#issuecomment-1199288424

   > Could you share your solution? We can discuss first.
   
   1. In server side,  if `requireBufferId ` not found when send data, thrown an exception.
   2. In client side, if fail to send data, require buffer again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #76:
URL: https://github.com/apache/incubator-uniffle/issues/76#issuecomment-1196225665

   Do you have the solution of the problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #76:
URL: https://github.com/apache/incubator-uniffle/issues/76#issuecomment-1200436396

   cc @colinmjj . There seems not be cases in our production environment. But I think the analysis is correct. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on issue #76:
URL: https://github.com/apache/incubator-uniffle/issues/76#issuecomment-1196662502

   Yes, it is be testing in our production environment. I will watch it for a while. If it's OK, I will create a pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #76:
URL: https://github.com/apache/incubator-uniffle/issues/76#issuecomment-1197623471

   Could you share your solution?  We can discuss first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] colinmjj commented on issue #76: [Improvement] Disallow sendShuffleData if requireBufferId expired

Posted by GitBox <gi...@apache.org>.
colinmjj commented on issue #76:
URL: https://github.com/apache/incubator-uniffle/issues/76#issuecomment-1200622415

   I think @xianjingfeng is right, with current implementation, OOM will happen if `requireBufferId` was expired in Shuffle Server already, this maybe caused by GC, network problem, high workload in shuffle server etc.
   It's better to have the limitation to accept the data with `requireBufferId` only to avoid such problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org