You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/08 15:34:35 UTC

[GitHub] [spark] attilapiros edited a comment on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

attilapiros edited a comment on pull request #30763:
URL: https://github.com/apache/spark/pull/30763#issuecomment-792821433


   >. Then, `BlockManagerId` would be a native implementation for Spark and users could implement `Location` to support custom storage.
   
   To test the idea I try to come up with hard situations but  this does not mean I am against the idea.
   
   So if I understand correctly `BlockManagerId` would extend the `Location` class, right? 
   And here `MapStatus#location` would be a generic `Location`?
   
   In this case we should check the references of this `MapStatus#location` and based on that decide where we are safe to cast `Location` to `BlockManagerId` or where else we would pass the location further as a  `Location` (or at least what else the generic location should contain to have the existing things working...).
   
   As the current reader uses `MapOutputTracker#getMapSizesByExecutorId` you would like to keep that method and runtime throw an exception when it's called and location is not `BlockManagerId`? This is a central method to get `blocksByAddress` for fetching in the Spark shuffle.
   
   For example as I see `MapOutputTracker` is tailored to satisfy the current shuffle solution. This should be checked for the idea.
   
   On the other hand write side might be easier as there MapStatus is filled with the id of the current block manager. So a new writer implementation just uses its location.
   
   But for the read side my worry is having runtime checks/assert/guards to enforce when allowed to use what.
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org