You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/23 01:02:22 UTC

[GitHub] [spark] mccheah commented on pull request #28618: [SPARK-31801][WIP][API][SHUFFLE] Register map output metadata

mccheah commented on pull request #28618:
URL: https://github.com/apache/spark/pull/28618#issuecomment-647844596


   > I was thinking a lot on `ShuffleDriverComponents` and I have an idea how to improve it.
   > 
   > The problem I believe this class tries to fulfil two very separate roles: be a **builder** and the **result of the building** in the same time.
   > 
   > This is why we need this kind of check:
   > https://github.com/apache/spark/blob/289050e4309b024146867b7b32974ab4746eb570/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleDriverComponents.java#L47-L49
   > 
   > If the building is cleanly separated from the result of the building then we can be sure the prerequisites are fulfilled before.
   > 
   > I would change it by transforming it to be the result of the building in the following way:
   > 
   > * the `initializeApplication` (I mean the process of the initialisation and not the returned Map) should be part of the building. The documentation of the `ShuffleDataIO#driver` method can be extended by mentioning this is the right place to initialize.
   > * the `ShuffleDriverComponents` could have a new method which gives back the "additional SparkConf settings necessary for initializing the executor components" we can call it like `additonalExecutorConfigs`. This new method would replace the old `initializeApplication`
   > 
   > One more idea / question:
   > 
   > * I do not see why the `ShuffleOutputTracker` is optional. Either we or the API user can provide an implementation where the methods are empty this way the API a bit simpler.
   > 
   > @mccheah what do you think?
   
   You bring up a good point. I can adjust the PR accordingly. It does seem like the components does both an initialization and a runtime mode, and it would be more ideal to separate the two. Thanks for critically thinking about this!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org