You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@celeborn.apache.org by GitBox <gi...@apache.org> on 2023/01/03 08:43:59 UTC
[GitHub] [incubator-celeborn] AngersZhuuuu opened a new pull request, #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive ple times, not only twice
AngersZhuuuu opened a new pull request, #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136
### What changes were proposed in this pull request?
I noticed that push data can revive many times but push merged data only revive twice. if some machine is not healthy in the cluster, too easy to cause job failed. We should make doPushMergedData support revive more than twice too.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1376682917
> needs to update style
Fixed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice
Posted by GitBox <gi...@apache.org>.
waitinfuture commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1376675106
needs to update style
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] AngersZhuuuu commented on pull request #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive multiple times, not only twice
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1369511148
ping @nafiyAix @waitinfuture
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] codecov[bot] commented on pull request #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive multiple times, not only twice
Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1369514133
# [Codecov](https://codecov.io/gh/apache/incubator-celeborn/pull/1136?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#1136](https://codecov.io/gh/apache/incubator-celeborn/pull/1136?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a54bf22) into [main](https://codecov.io/gh/apache/incubator-celeborn/commit/c9dcf312f86bd0869667223dc33e340ccefb549a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c9dcf31) will **increase** coverage by `0.03%`.
> The diff coverage is `0.00%`.
```diff
@@ Coverage Diff @@
## main #1136 +/- ##
============================================
+ Coverage 26.94% 26.96% +0.03%
- Complexity 746 748 +2
============================================
Files 196 196
Lines 16832 16829 -3
Branches 1820 1818 -2
============================================
+ Hits 4534 4537 +3
+ Misses 12003 11998 -5
+ Partials 295 294 -1
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-celeborn/pull/1136?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [.../org/apache/celeborn/client/ShuffleClientImpl.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1136/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9jbGllbnQvU2h1ZmZsZUNsaWVudEltcGwuamF2YQ==) | `19.51% <0.00%> (+0.08%)` | :arrow_up: |
| [...ice/deploy/master/clustermeta/ha/HARaftServer.java](https://codecov.io/gh/apache/incubator-celeborn/pull/1136/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFzdGVyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9jZWxlYm9ybi9zZXJ2aWNlL2RlcGxveS9tYXN0ZXIvY2x1c3Rlcm1ldGEvaGEvSEFSYWZ0U2VydmVyLmphdmE=) | `77.93% <0.00%> (+1.36%)` | :arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] waitinfuture merged pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice
Posted by GitBox <gi...@apache.org>.
waitinfuture merged PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1136: [CELEBORN-190][BUG] doPushMergedData should also support revive multiple times, not only twice
Posted by GitBox <gi...@apache.org>.
waitinfuture commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1373399287
When I test this pr I found that currently in main branch, both pushData and pushMergedData can at most revive once. For pushData, the first time of onFailure, it passes the original callback instead the wrappedCallback, so if the revived pushdata fails, the pushData will fail.
```
if (!mapperEnded(shuffleId, mapId, attemptId)) {
pushDataRetryPool.submit(
() ->
submitRetryPushData(
applicationId,
shuffleId,
mapId,
attemptId,
body,
nextBatchId,
loc,
callback,
pushState,
getPushDataFailCause(e.getMessage())));
```
After discussing with @AngersZhuuuu , we can rework this pr by adding reviveTimes for each pushdata and pushmergeddata
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-celeborn] waitinfuture commented on pull request #1136: [CELEBORN-190] doPushMergedData should also support revive multiple times, not only twice
Posted by GitBox <gi...@apache.org>.
waitinfuture commented on PR #1136:
URL: https://github.com/apache/incubator-celeborn/pull/1136#issuecomment-1375220263
LGTM, could you plz add a test case?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@celeborn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org