You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/14 00:21:04 UTC

[GitHub] [spark] otterc commented on pull request #28618: [SPARK-31801][API][SHUFFLE] Register map output metadata

otterc commented on pull request #28618:
URL: https://github.com/apache/spark/pull/28618#issuecomment-691748290


   I looked at the changes proposed here so that we can use the interfaces here for Push-based shuffle ([SPIP](https://docs.google.com/document/d/1mYzKVZllA5Flw8AtoX7JUcXBOnNIDADWRbJ7GI6Y71Q/edit?ts=5f5d1718) and [code](https://github.com/linkedin/spark/tree/magnet-upstream)). 
   In Push-based shuffle, we have introduced [merge-statuses](https://github.com/linkedin/spark/blob/magnet-upstream/core/src/main/scala/org/apache/spark/scheduler/MergeStatus.scala) which represent all the map outputs that were merged into a larger block. These statuses are collected by the driver from the Shuffle Services. 
   
   I think we will be able to use the current `ShuffleOutputTracker` API. The implementation of this API could have the triggers for finalizing the shuffle merge.
   
   I still have to wrap my head around how we can model `mergeStatus` as part of `MapOutputMetadata`. Multiple`mapStatus`'s would point to a single `mergeStatus`, so this would introduce some complexity.
   
   We may need to evolve them to fit the push-based shuffle use case. As long as we are open to potentially making some backward incompatible changes, these APIs look good to me for now.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org