You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@celeborn.apache.org by GitBox <gi...@apache.org> on 2022/11/27 08:15:43 UTC

[GitHub] [incubator-celeborn] waitinfuture commented on pull request #990: [ISSUE-989][FEATURE] Support batch commit hard split partition before stage end

waitinfuture commented on PR #990:
URL: https://github.com/apache/incubator-celeborn/pull/990#issuecomment-1328193735

   In this pr, multiple commitFiles requests among hard-split partitions are guaranteed non-overlap, which means no two requests commit the same PartitionLocation, guarded by sync on ShuffleCommittedInfo. But commitFiles requests between handleStageEnd and hard-split are not guaranteed, and two requests can commit the same PartitionLocation, which is error-prone.
   So we should wait for all hard-split commitFiles request finish before trigger commitFiles in handleStageEnd.
   
   Another issue is that we need a better policy to handle multiple commitFiles for a single shuffleId in server side, keep consistent with retryCommitFiles. I think we can give a unique epoch for each commitFiles request, ensuring that no overlap among any two epochs, and retryCommitFiles only impacts its epoch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org