You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/25 09:50:00 UTC

[GitHub] [spark] attilapiros commented on a change in pull request #31876: [WIP][SPARK-XXXX][API][CORE] Abstract Location in MapStatus to enable support for custom storage

attilapiros commented on a change in pull request #31876:
URL: https://github.com/apache/spark/pull/31876#discussion_r599413719



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -502,7 +502,7 @@ private[spark] class MapOutputTrackerMaster(
   }
 
   /** Unregister map output information of the given shuffle, mapper and block manager */
-  def unregisterMapOutput(shuffleId: Int, mapIndex: Int, bmAddress: BlockManagerId): Unit = {
+  def unregisterMapOutput(shuffleId: Int, mapIndex: Int, bmAddress: Location): Unit = {
     shuffleStatuses.get(shuffleId) match {
       case Some(shuffleStatus) =>
         shuffleStatus.removeMapOutput(mapIndex, bmAddress)

Review comment:
       ```suggestion
           shuffleStatus.removeMapOutput(mapIndex, loc)
   ```

##########
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##########
@@ -28,16 +28,23 @@ import org.apache.spark.internal.config
 import org.apache.spark.storage.BlockManagerId
 import org.apache.spark.util.Utils
 
+trait Location extends Externalizable {

Review comment:
       Using`Externalizable` here is totally fine.

##########
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##########
@@ -28,16 +28,23 @@ import org.apache.spark.internal.config
 import org.apache.spark.storage.BlockManagerId
 import org.apache.spark.util.Utils
 
+trait Location extends Externalizable {
+  def host: String
+  def port: Int
+  def hostPort: String
+  def executorId: String = "unknown"

Review comment:
       I think in `Location` to specify these methods violates the abstraction. I know this convenient and avoids casting but still we should avoid meaningless methods. In a disaggregated storage solution the `executorId` has no value and probably host and port is not enough and never needed as separate entities but an URL like construct will be more useful (or something else but it is the responsibility of the specific subclass).

##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -502,7 +502,7 @@ private[spark] class MapOutputTrackerMaster(
   }
 
   /** Unregister map output information of the given shuffle, mapper and block manager */
-  def unregisterMapOutput(shuffleId: Int, mapIndex: Int, bmAddress: BlockManagerId): Unit = {
+  def unregisterMapOutput(shuffleId: Int, mapIndex: Int, bmAddress: Location): Unit = {

Review comment:
       Nit: to be consistent with the other changes:
   
   ```suggestion
     def unregisterMapOutput(shuffleId: Int, mapIndex: Int, loc: Location): Unit = {
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org