You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by GitBox <gi...@apache.org> on 2023/01/03 08:43:59 UTC

[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive ple times, not only twice

AngersZhuuuu opened a new pull request, #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136

   ### What changes were proposed in this pull request?
   I noticed that push data can revive many times but push merged data only revive twice. if some machine is not healthy in the cluster, too easy to cause job failed. We should make doPushMergedData support revive more than twice too.
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1376682917

   > needs to update style
   
   Fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice

Posted by GitBox <gi...@apache.org>.
waitinfuture commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1376675106

   needs to update style


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive multiple times, not only twice

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1369511148

   ping @nafiyAix @waitinfuture 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive multiple times, not only twice

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1369514133

   # [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1136?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1136](https://codecov.io/gh/apache/incubator-celeborn/pull/1136?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a54bf22) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/c9dcf312f86bd0869667223dc33e340ccefb549a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c9dcf31) will **increase** coverage by `0.03%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##               main    #1136      +/-   ##
   ============================================
   + Coverage     26.94%   26.96%   +0.03%     
   - Complexity      746      748       +2     
   ============================================
     Files           196      196              
     Lines         16832    16829       -3     
     Branches       1820     1818       -2     
   ============================================
   + Hits           4534     4537       +3     
   + Misses        12003    11998       -5     
   + Partials        295      294       -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-celeborn/pull/1136?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../org/apache/celeborn/client/ShuffleClientImpl.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1136/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9jbGllbnQvU2h1ZmZsZUNsaWVudEltcGwuamF2YQ==) | `19.51% <0.00%> (+0.08%)` | :arrow_up: |
   | [...ice/deploy/master/clustermeta/ha/HARaftServer.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1136/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9zZXJ2aWNlL2RlcGxveS9tYXN0ZXIvY2x1c3Rlcm1ldGEvaGEvSEFSYWZ0U2VydmVyLmphdmE=) | `77.93% <0.00%> (+1.36%)` | :arrow_up: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture merged pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice

Posted by GitBox <gi...@apache.org>.
waitinfuture merged PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive multiple times, not only twice

Posted by GitBox <gi...@apache.org>.
waitinfuture commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1373399287

   When I test this pr I found that currently in main branch, both pushData and pushMergedData can at most revive once. For pushData, the first time of onFailure, it passes the original callback instead the wrappedCallback, so if the revived pushdata fails, the pushData will fail.
   ```
                 if (!mapperEnded(shuffleId, mapId, attemptId)) {
                   pushDataRetryPool.submit(
                       () ->
                           submitRetryPushData(
                               applicationId,
                               shuffleId,
                               mapId,
                               attemptId,
                               body,
                               nextBatchId,
                               loc,
                               callback,
                               pushState,
                               getPushDataFailCause(e.getMessage())));
   ```
   
   After discussing with @AngersZhuuuu , we can rework this pr by adding reviveTimes for each pushdata and pushmergeddata


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice

Posted by GitBox <gi...@apache.org>.
waitinfuture commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1375220263

   LGTM, could you plz add a test case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org