You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/11 06:30:09 UTC

[GitHub] [spark] boy-uber opened a new pull request #30004: [Shuffle] [SPARK-33114] Add metadata in MapStatus to support custom shuffle manager

boy-uber opened a new pull request #30004:
URL: https://github.com/apache/spark/pull/30004


   ### What changes were proposed in this pull request?
   Add generic metadata in MapStatus class to support custom shuffle manager. Also add a new method to retrieve all map output statuses and their metadata. See Jira: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-33114
   
   ### Why are the changes needed?
   Current MapStatus class is tightly bound with local (sort merge) shuffle which uses BlockManagerId to store the shuffle data location. It could not support other custom shuffle manager implementation. 
   
   For example, when we implement Remote Shuffle Service, we want to put remote shuffle server information into MapStatus so reducer could fetch that information and figure out where to fetch data. The added MapStatus.metadata field could store such information.
   
   If people implement other shuffle manager, they could also store their related information into this metadata field.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added unit test
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711410376


   **[Test build #129962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129962/testReport)** for PR 30004 at commit [`fe693eb`](https://github.com/apache/spark/commit/fe693eb2d7c22d2be1344214e3d3f073be643eb4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-745694448


   > @hiboyang for helping @mccheah who is busy with other projects I cloned his PR as #30763.
   > 
   > I will keep this up-to-date with master and react to review comments.
   
   @attilapiros sounds good, thanks for working on that and keeping us updated!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-791892072


   You can fix this, please try the followings:
   
   ```
   $ git checkout <your-local branch>
   $ git reset --hard fe693eb2d7c22d2be1344214e3d3f073be643eb4
   $ git push -f
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-741947012


   Hmm, I still see some updates from @attilapiros and @mccheah in Oct on #28618.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588865035



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -171,6 +200,36 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     mapWorkerRpcEnv.shutdown()
   }
 
+  test("remote get all map output statuses with metadata") {

Review comment:
       fine for me
    




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-791889941


   Oops, I messed up my git a little bit. I deleted and recreated my previous Spark fork. Seems I could not update this PR anymore. @attilapiros I created another PR https://github.com/apache/spark/pull/31763 with changes addressing your comments. Sorry for the inconvenience to switch to that new PR.
   
   For the "The clone on the Array is not a deep copy" comment, want to wait and see how you like the idea to change the method to "getAllMapOutputStatusMetadata".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711283205


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] boy-uber commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
boy-uber commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-716185714


   > @boy-uber I feel that #28618 and previously related works seems doing what you want here.
   
   Yes, mccheah was working on #28618 to extend map status metadata, and he stopped working on that any more. I am taking a slightly different approach, trying to find out the minimum changes we need to support remote shuffle.
   
   The code in this PR is much smaller than that PR, and could support remote shuffle as well. May we review this one?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30004: [Shuffle] [SPARK-33114] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-706658070


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588826812



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -827,6 +848,13 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr
     }
   }
 
+  override def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    val statuses = getStatuses(shuffleId, conf)

Review comment:
       Please clear the `mapStatuses` in case of `MetadataFetchFailedException`!
   
   Reasoning:
   The `getStatuses` method before this PR was only used in `getMapSizesByExecutorId ` where the `MetadataFetchFailedException` (the case when missing output location was detected) handled by clearing of the `mapStatuses` cache as it is probably outdated.
   
   
   I am sure that clearing would not be missed if this cleaning would be done at the throwing of that exception.
   Could you please check whether it can be moved there?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588829730



##########
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##########
@@ -213,6 +253,12 @@ private[spark] class HighlyCompressedMapStatus private (
       out.writeByte(kv._2)
     }
     out.writeLong(_mapTaskId)
+    if (_metadata.isEmpty) {

Review comment:
       Nit:
   
   ```
   out.writeBoolean(_metadata.isDefined)
   _metadata.foreach(out.writeObject)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588829556



##########
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##########
@@ -139,11 +164,19 @@ private[spark] class CompressedMapStatus(
 
   override def mapId: Long = _mapTaskId
 
+  override def metadata: Option[Serializable] = _metadata
+
   override def writeExternal(out: ObjectOutput): Unit = Utils.tryOrIOException {
     loc.writeExternal(out)
     out.writeInt(compressedSizes.length)
     out.write(compressedSizes)
     out.writeLong(_mapTaskId)
+    if (_metadata.isEmpty) {

Review comment:
       Nit:
   ```
   out.writeBoolean(_metadata.isDefined)
   _metadata.foreach(out.writeObject)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588835243



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -171,6 +200,36 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     mapWorkerRpcEnv.shutdown()
   }
 
+  test("remote get all map output statuses with metadata") {

Review comment:
       I am thinking about extending the `remote fetch` test (the one before this) with an extra registered map output where metadata is given and then this test could be deleted.
    
   That way you will test the case when one of map status is given with and one is without a metadata.
   
   WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-742285489


   Hi @attilapiros, glad to see you would like to finish #28618. This PR is kind of a very simplified version of that one (#28618). My concern is #28618 may take very long time to get agreement from all the people (it is already there for 6 months). Could we switch to this PR if #28618 gets stuck again?
   
   > Yes, I would like to help in finishing #28618. Last time I checked it was just a few missing issues, now it might have some more merge conflicts...
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-716101289


   @boy-uber I feel that #28618 and previously related works seems doing what you want here. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-783112879


   Cool, thanks @attilapiros for the update!
   
   Hi @viirya @Ngone51, would you help to review the PR here?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711411058






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711334189






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros edited a comment on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros edited a comment on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792225606


   I assume branches are identified just by the name so if you are standing at the branch used for the other PR (https://github.com/apache/spark/pull/31763, lets say the name branch2) and then create a branch with the same name which is used for this PR (call it branch1) and force push it then the PR updates, so
   ```
    $ git checkout branch2
    $ git checkout -b branch1
    (if branch 1 exists you can delete it before locally with: git branch -d branch1
    $ git push -f
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588867352



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -774,6 +783,18 @@ private[spark] class MapOutputTrackerMaster(
     }
   }
 
+  def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    shuffleStatuses.get(shuffleId) match {
+      case Some(shuffleStatus) =>
+        shuffleStatus.withMapStatuses { statuses =>
+          MapOutputTracker.checkMapStatuses(statuses, shuffleId)
+          statuses.clone

Review comment:
       It could not be enough if the metadata can mutate. But as I see we could solve all the problems with immutable metadata easily. So to be on the safe side please document we require the metadata to be immutable and introduce an `updateMetadata(meta: Option[Serializable])` method in `MapStatus`. Then we will be safe and all the use cases are covered.
   
   (And you can use a case class for the Uber RSS's `MapTaskRssInfo`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792225606


   I assume branches are identified just by the name so if you are standing at the branch used for the other PR (https://github.com/apache/spark/pull/31763, lets say the name branch2) and then create a branch with the same name which is used for this PR (call it branch1) and force push it then the PR updates, so
   ```
    $ git checkout brach2
    $ git checkout -b branch1
    (if branch 1 exists you can delete it before locally with: git branch -d branch1
    $ git push -f
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-791899993


   You should be able to set the HEAD with `git reset --hard` even to the commit where your other PR's HEAD is standing.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-791887368


   > Sorry for the many separate single comments: first my plan was to peek into a bit.
   
   No worry, your comments are great and very helpful :) Thank a lot for taking time to review!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588828826



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -774,6 +783,18 @@ private[spark] class MapOutputTrackerMaster(
     }
   }
 
+  def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    shuffleStatuses.get(shuffleId) match {
+      case Some(shuffleStatus) =>
+        shuffleStatus.withMapStatuses { statuses =>
+          MapOutputTracker.checkMapStatuses(statuses, shuffleId)
+          statuses.clone

Review comment:
       I see your intention by calling this `clone` here but I do not think this is enough.
   As the `MapStatus` is trait and not a case class in addition its implementations are mutable with a lot of `var` fields.
   
   The `clone` on the `Array` is not a deep copy.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-741941170


   cc @Ngone51 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792208875


   > You can fix this, please try the followings:
   > 
   > ```
   > $ git checkout <your-local branch>
   > $ git reset --hard fe693eb2d7c22d2be1344214e3d3f073be643eb4
   > $ git push -f
   > ```
   
   The commit fe693eb2d7c22d2be1344214e3d3f073be643eb4 was gone, and I could not git reset on it. That commit was created by my previous Spark fork which I deleted. I did not have that original fork/commit on my local git repo now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-782912775


   @attilapiros are you still working on this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792208216


   > You should be able to set the HEAD with `git reset --hard` even to the commit where your other PR's HEAD is standing.
   
   Previously I deleted my Spark fork. Then I forked again and re-created mapStatusMetadata2 branch under my new Spark fork (https://github.com/hiboyang/spark/tree/mapStatusMetadata2), but this PR does not update after I commit new changes in  mapStatusMetadata2 branch.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588826812



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -827,6 +848,13 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr
     }
   }
 
+  override def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    val statuses = getStatuses(shuffleId, conf)

Review comment:
       Please clear the `mapStatuses` in case of `MetadataFetchFailedException`!
   
   Reasoning:
   The `getStatuses` method before this PR was only used in `getMapSizesByExecutorId ` where the `MetadataFetchFailedException` (the case when missing output location was detected) handled by clearing of the `mapStatuses` cache as it is probably outdated.
   
   
   ~~I am sure that clearing would not be missed if this cleaning would be done at the throwing of that exception.~~
   ~~Could you please check whether it can be moved there?~~
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588830385



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1011,4 +1039,15 @@ private[spark] object MapOutputTracker extends Logging {
 
     splitsByAddress.mapValues(_.toSeq).iterator
   }
+
+  def checkMapStatuses(statuses: Array[MapStatus], shuffleId: Int): Unit = {
+    assert (statuses != null)
+    for (status <- statuses) {
+      if (status == null) {

Review comment:
       I think you can extract this `if` into a new method and reuse the method in `convertMapStatuses`.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros edited a comment on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros edited a comment on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-742033452


   Yes, I would like to help in finishing https://github.com/apache/spark/pull/28618. Last time I checked it was just a few missing issues, now it might have some more merge conflicts...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711334172


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34568/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-706658070


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588832639



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -63,16 +63,37 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     assert(tracker.containsShuffle(10))
     val size1000 = MapStatus.decompressSize(MapStatus.compressSize(1000L))
     val size10000 = MapStatus.decompressSize(MapStatus.compressSize(10000L))
-    tracker.registerMapOutput(10, 0, MapStatus(BlockManagerId("a", "hostA", 1000),
-        Array(1000L, 10000L), 5))
-    tracker.registerMapOutput(10, 1, MapStatus(BlockManagerId("b", "hostB", 1000),
-        Array(10000L, 1000L), 6))
+    val mapStatus1 = MapStatus(BlockManagerId("a", "hostA", 1000), Array(1000L, 10000L), 5)
+    val mapStatus2 = MapStatus(BlockManagerId("b", "hostB", 1000), Array(10000L, 1000L), 6)
+    tracker.registerMapOutput(10, 0, mapStatus1)
+    tracker.registerMapOutput(10, 1, mapStatus2)
     val statuses = tracker.getMapSizesByExecutorId(10, 0)
     assert(statuses.toSet ===
       Seq((BlockManagerId("a", "hostA", 1000),
         ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0))),
           (BlockManagerId("b", "hostB", 1000),
             ArrayBuffer((ShuffleBlockId(10, 6, 0), size10000, 1)))).toSet)
+    val allStatuses = tracker.getAllMapOutputStatuses(10)
+    assert(allStatuses.toSet === Set(mapStatus1, mapStatus2))
+    assert(0 == tracker.getNumCachedSerializedBroadcast)
+    tracker.stop()
+    rpcEnv.shutdown()
+  }
+
+  test("master register shuffle with map status metadata") {
+    val rpcEnv = createRpcEnv("test")
+    val tracker = newTrackerMaster()
+    tracker.trackerEndpoint = rpcEnv.setupEndpoint(MapOutputTracker.ENDPOINT_NAME,
+      new MapOutputTrackerMasterEndpoint(rpcEnv, tracker, conf))
+    tracker.registerShuffle(10, 2)
+    val mapStatus1 = MapStatus(BlockManagerId("a", "hostA", 1000),
+      Array(1000L, 10000L), 5, Some("metadata1"))
+    val mapStatus2 = MapStatus(BlockManagerId("b", "hostB", 1000),
+      Array(10000L, 1000L), 6, Some(1001))
+    tracker.registerMapOutput(10, 0, mapStatus1)
+    tracker.registerMapOutput(10, 1, mapStatus2)
+    val allStatuses = tracker.getAllMapOutputStatuses(10)
+    assert(allStatuses.toSet === Set(mapStatus1, mapStatus2))

Review comment:
       detto




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711334189






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-716188127


   I was not aware that #28618  was abandoned. +CC @mccheah 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588832427



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -63,16 +63,37 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     assert(tracker.containsShuffle(10))
     val size1000 = MapStatus.decompressSize(MapStatus.compressSize(1000L))
     val size10000 = MapStatus.decompressSize(MapStatus.compressSize(10000L))
-    tracker.registerMapOutput(10, 0, MapStatus(BlockManagerId("a", "hostA", 1000),
-        Array(1000L, 10000L), 5))
-    tracker.registerMapOutput(10, 1, MapStatus(BlockManagerId("b", "hostB", 1000),
-        Array(10000L, 1000L), 6))
+    val mapStatus1 = MapStatus(BlockManagerId("a", "hostA", 1000), Array(1000L, 10000L), 5)
+    val mapStatus2 = MapStatus(BlockManagerId("b", "hostB", 1000), Array(10000L, 1000L), 6)
+    tracker.registerMapOutput(10, 0, mapStatus1)
+    tracker.registerMapOutput(10, 1, mapStatus2)
     val statuses = tracker.getMapSizesByExecutorId(10, 0)
     assert(statuses.toSet ===
       Seq((BlockManagerId("a", "hostA", 1000),
         ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0))),
           (BlockManagerId("b", "hostB", 1000),
             ArrayBuffer((ShuffleBlockId(10, 6, 0), size10000, 1)))).toSet)
+    val allStatuses = tracker.getAllMapOutputStatuses(10)
+    assert(allStatuses.toSet === Set(mapStatus1, mapStatus2))

Review comment:
       With the `toSet` you lose the ordering meanwhile ordering can be important (that is the mapIndex) so it should be tested.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588830523



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -827,6 +848,13 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr
     }
   }
 
+  override def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    val statuses = getStatuses(shuffleId, conf)

Review comment:
       Now I see why you cannot move the clearing there! 
   Still the clearing itself is needed to be done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588833956



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -147,13 +171,18 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     intercept[FetchFailedException] { mapWorkerTracker.getMapSizesByExecutorId(10, 0) }
 
     val size1000 = MapStatus.decompressSize(MapStatus.compressSize(1000L))
-    masterTracker.registerMapOutput(10, 0, MapStatus(
-      BlockManagerId("a", "hostA", 1000), Array(1000L), 5))
+    val mapStatus = MapStatus(BlockManagerId("a", "hostA", 1000), Array(1000L), 5)
+    masterTracker.registerMapOutput(10, 0, mapStatus)
     mapWorkerTracker.updateEpoch(masterTracker.getEpoch)
     assert(mapWorkerTracker.getMapSizesByExecutorId(10, 0).toSeq ===
       Seq((BlockManagerId("a", "hostA", 1000),
         ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0)))))
-    assert(0 == masterTracker.getNumCachedSerializedBroadcast)
+    val allMapOutputStatuses = mapWorkerTracker.getAllMapOutputStatuses(10)
+    assert(allMapOutputStatuses.length === 1)
+    assert(allMapOutputStatuses(0).location === mapStatus.location)

Review comment:
       I would prefer to have an `equals` method on  `MapStatus` and use the equals here.
   Because in that case when `MapStatus` is extended with a new field this test will validate the serialization / deserialization of this new field too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792233708


   I see, let's continue at the other PR and close this one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-742033452


   Yes, I would like to help in finishing https://github.com/apache/spark/pull/28618 last time I checked it was just a few missing issues, now it might have some more merge conflicts...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711411058






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792237175


   This PR is closed. Please review the other PR https://github.com/apache/spark/pull/31763. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588834001



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -327,6 +388,7 @@ class MapOutputTrackerSuite extends SparkFunSuite {
                 (ShuffleBlockId(10, 6, 2), size1000, 1)))
         )
     )
+    assert(tracker.getAllMapOutputStatuses(10).toSet === Set(mapStatus1, mapStatus2))

Review comment:
       order check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30004: [Shuffle] [SPARK-33114] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-706657964


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30004: [Shuffle] [SPARK-33114] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-706657964


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792231213


   > I assume branches are identified just by the name so if you are standing at the branch used for the other PR (#31763, lets say the name branch2) and then create a branch with the same name which is used for this PR (call it branch1) and force push it then the PR updates, so
   > 
   > ```
   >  $ git checkout branch2
   >  $ git checkout -b branch1
   >  (if branch 1 exists you can delete it before locally with: git branch -d branch1
   >  $ git push -f
   > ```
   
   I tried this approach and it did not work. The branch I previously used for this PR is mapStatusMetadata2. I re-created this same name branch and pushed to GitHub, but this PR does not refresh with my new change.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588842660



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -774,6 +783,18 @@ private[spark] class MapOutputTrackerMaster(
     }
   }
 
+  def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    shuffleStatuses.get(shuffleId) match {
+      case Some(shuffleStatus) =>
+        shuffleStatus.withMapStatuses { statuses =>
+          MapOutputTracker.checkMapStatuses(statuses, shuffleId)
+          statuses.clone

Review comment:
       Yes, got your point. How about change this method to `getAllMapOutputStatusMetadata` to only return the metadada?
   
   
   

##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -171,6 +200,36 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     mapWorkerRpcEnv.shutdown()
   }
 
+  test("remote get all map output statuses with metadata") {

Review comment:
       The previous test has `masterTracker.unregisterMapOutput` and some test verification for that, thus want to avoid adding too much for that test. Also this test is specifically testing non-null metadata object, kind of following "separation of concerns" to make it as a separate test.

##########
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##########
@@ -213,6 +253,12 @@ private[spark] class HighlyCompressedMapStatus private (
       out.writeByte(kv._2)
     }
     out.writeLong(_mapTaskId)
+    if (_metadata.isEmpty) {

Review comment:
       Nice suggestion!

##########
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##########
@@ -139,11 +164,19 @@ private[spark] class CompressedMapStatus(
 
   override def mapId: Long = _mapTaskId
 
+  override def metadata: Option[Serializable] = _metadata
+
   override def writeExternal(out: ObjectOutput): Unit = Utils.tryOrIOException {
     loc.writeExternal(out)
     out.writeInt(compressedSizes.length)
     out.write(compressedSizes)
     out.writeLong(_mapTaskId)
+    if (_metadata.isEmpty) {

Review comment:
       nice suggestion!

##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -827,6 +848,13 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr
     }
   }
 
+  override def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    val statuses = getStatuses(shuffleId, conf)

Review comment:
       yes, we only need to clear `mapStatuses` in `MapOutputTrackerWorker` , will add that

##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -63,16 +63,37 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     assert(tracker.containsShuffle(10))
     val size1000 = MapStatus.decompressSize(MapStatus.compressSize(1000L))
     val size10000 = MapStatus.decompressSize(MapStatus.compressSize(10000L))
-    tracker.registerMapOutput(10, 0, MapStatus(BlockManagerId("a", "hostA", 1000),
-        Array(1000L, 10000L), 5))
-    tracker.registerMapOutput(10, 1, MapStatus(BlockManagerId("b", "hostB", 1000),
-        Array(10000L, 1000L), 6))
+    val mapStatus1 = MapStatus(BlockManagerId("a", "hostA", 1000), Array(1000L, 10000L), 5)
+    val mapStatus2 = MapStatus(BlockManagerId("b", "hostB", 1000), Array(10000L, 1000L), 6)
+    tracker.registerMapOutput(10, 0, mapStatus1)
+    tracker.registerMapOutput(10, 1, mapStatus2)
     val statuses = tracker.getMapSizesByExecutorId(10, 0)
     assert(statuses.toSet ===
       Seq((BlockManagerId("a", "hostA", 1000),
         ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0))),
           (BlockManagerId("b", "hostB", 1000),
             ArrayBuffer((ShuffleBlockId(10, 6, 0), size10000, 1)))).toSet)
+    val allStatuses = tracker.getAllMapOutputStatuses(10)
+    assert(allStatuses.toSet === Set(mapStatus1, mapStatus2))

Review comment:
       good suggestion, will use Array to check the sequence as well

##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1011,4 +1039,15 @@ private[spark] object MapOutputTracker extends Logging {
 
     splitsByAddress.mapValues(_.toSeq).iterator
   }
+
+  def checkMapStatuses(statuses: Array[MapStatus], shuffleId: Int): Unit = {
+    assert (statuses != null)
+    for (status <- statuses) {
+      if (status == null) {

Review comment:
       yes, good suggestion!

##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -327,6 +388,7 @@ class MapOutputTrackerSuite extends SparkFunSuite {
                 (ShuffleBlockId(10, 6, 2), size1000, 1)))
         )
     )
+    assert(tracker.getAllMapOutputStatuses(10).toSet === Set(mapStatus1, mapStatus2))

Review comment:
       good catch!

##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -147,13 +171,18 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     intercept[FetchFailedException] { mapWorkerTracker.getMapSizesByExecutorId(10, 0) }
 
     val size1000 = MapStatus.decompressSize(MapStatus.compressSize(1000L))
-    masterTracker.registerMapOutput(10, 0, MapStatus(
-      BlockManagerId("a", "hostA", 1000), Array(1000L), 5))
+    val mapStatus = MapStatus(BlockManagerId("a", "hostA", 1000), Array(1000L), 5)
+    masterTracker.registerMapOutput(10, 0, mapStatus)
     mapWorkerTracker.updateEpoch(masterTracker.getEpoch)
     assert(mapWorkerTracker.getMapSizesByExecutorId(10, 0).toSeq ===
       Seq((BlockManagerId("a", "hostA", 1000),
         ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0)))))
-    assert(0 == masterTracker.getNumCachedSerializedBroadcast)
+    val allMapOutputStatuses = mapWorkerTracker.getAllMapOutputStatuses(10)
+    assert(allMapOutputStatuses.length === 1)
+    assert(allMapOutputStatuses(0).location === mapStatus.location)

Review comment:
       In responding of one of previous comments, I am suggesting returning only metadata instead of the whole map statue object. Will revisit here after that discussion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-791877721


   Sorry for the many separate single comments: first my plan was to peek into a bit.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711283592


   **[Test build #129962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129962/testReport)** for PR 30004 at commit [`fe693eb`](https://github.com/apache/spark/commit/fe693eb2d7c22d2be1344214e3d3f073be643eb4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711311018


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34568/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-744743097


   @hiboyang for helping @mccheah who is busy with other projects I cloned his PR as https://github.com/apache/spark/pull/30763. 
   
   I will keep this up-to-date with master and react to review comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-792236879


   > I see, let's continue at the other PR and close this one.
   
   Cool, thanks! The other PR is https://github.com/apache/spark/pull/31763. Will close this one. Cc @viirya @Ngone51 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588826812



##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -827,6 +848,13 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr
     }
   }
 
+  override def getAllMapOutputStatuses(shuffleId: Int): Array[MapStatus] = {
+    logDebug(s"Fetching all output statuses for shuffle $shuffleId")
+    val statuses = getStatuses(shuffleId, conf)

Review comment:
       Please clean up the `mapStatuses` in case of `MetadataFetchFailedException`!
   
   Reasoning:
   The `getStatuses` method before this PR was only used in `getMapSizesByExecutorId ` where the `MetadataFetchFailedException` (the case when missing output location was detected) handled by cleaning of the `mapStatuses` cache as it is probably outdated.
   
   
   I am sure that cleaning would not be missed if this cleaning would be done at the throwing of that exception.
   Could you please check whether it can be moved there?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588833195



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -147,13 +171,18 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     intercept[FetchFailedException] { mapWorkerTracker.getMapSizesByExecutorId(10, 0) }
 
     val size1000 = MapStatus.decompressSize(MapStatus.compressSize(1000L))
-    masterTracker.registerMapOutput(10, 0, MapStatus(
-      BlockManagerId("a", "hostA", 1000), Array(1000L), 5))
+    val mapStatus = MapStatus(BlockManagerId("a", "hostA", 1000), Array(1000L), 5)
+    masterTracker.registerMapOutput(10, 0, mapStatus)
     mapWorkerTracker.updateEpoch(masterTracker.getEpoch)
     assert(mapWorkerTracker.getMapSizesByExecutorId(10, 0).toSeq ===
       Seq((BlockManagerId("a", "hostA", 1000),
         ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0)))))
-    assert(0 == masterTracker.getNumCachedSerializedBroadcast)

Review comment:
       Please put the assert back:
   ```
       assert(0 == masterTracker.getNumCachedSerializedBroadcast)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on a change in pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on a change in pull request #30004:
URL: https://github.com/apache/spark/pull/30004#discussion_r588832639



##########
File path: core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
##########
@@ -63,16 +63,37 @@ class MapOutputTrackerSuite extends SparkFunSuite {
     assert(tracker.containsShuffle(10))
     val size1000 = MapStatus.decompressSize(MapStatus.compressSize(1000L))
     val size10000 = MapStatus.decompressSize(MapStatus.compressSize(10000L))
-    tracker.registerMapOutput(10, 0, MapStatus(BlockManagerId("a", "hostA", 1000),
-        Array(1000L, 10000L), 5))
-    tracker.registerMapOutput(10, 1, MapStatus(BlockManagerId("b", "hostB", 1000),
-        Array(10000L, 1000L), 6))
+    val mapStatus1 = MapStatus(BlockManagerId("a", "hostA", 1000), Array(1000L, 10000L), 5)
+    val mapStatus2 = MapStatus(BlockManagerId("b", "hostB", 1000), Array(10000L, 1000L), 6)
+    tracker.registerMapOutput(10, 0, mapStatus1)
+    tracker.registerMapOutput(10, 1, mapStatus2)
     val statuses = tracker.getMapSizesByExecutorId(10, 0)
     assert(statuses.toSet ===
       Seq((BlockManagerId("a", "hostA", 1000),
         ArrayBuffer((ShuffleBlockId(10, 5, 0), size1000, 0))),
           (BlockManagerId("b", "hostB", 1000),
             ArrayBuffer((ShuffleBlockId(10, 6, 0), size10000, 1)))).toSet)
+    val allStatuses = tracker.getAllMapOutputStatuses(10)
+    assert(allStatuses.toSet === Set(mapStatus1, mapStatus2))
+    assert(0 == tracker.getNumCachedSerializedBroadcast)
+    tracker.stop()
+    rpcEnv.shutdown()
+  }
+
+  test("master register shuffle with map status metadata") {
+    val rpcEnv = createRpcEnv("test")
+    val tracker = newTrackerMaster()
+    tracker.trackerEndpoint = rpcEnv.setupEndpoint(MapOutputTracker.ENDPOINT_NAME,
+      new MapOutputTrackerMasterEndpoint(rpcEnv, tracker, conf))
+    tracker.registerShuffle(10, 2)
+    val mapStatus1 = MapStatus(BlockManagerId("a", "hostA", 1000),
+      Array(1000L, 10000L), 5, Some("metadata1"))
+    val mapStatus2 = MapStatus(BlockManagerId("b", "hostB", 1000),
+      Array(10000L, 1000L), 6, Some(1001))
+    tracker.registerMapOutput(10, 0, mapStatus1)
+    tracker.registerMapOutput(10, 1, mapStatus2)
+    val allStatuses = tracker.getAllMapOutputStatuses(10)
+    assert(allStatuses.toSet === Set(mapStatus1, mapStatus2))

Review comment:
       ditto




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] attilapiros commented on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
attilapiros commented on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-782915655


   @hiboyang  The problem is not getting enough reviews and without it I can not push this forward.
   I assume one reason behind might be its size: it is quite large (and of course by time it is keep diverging from base). 
   
   But I do not want to block you further. 
   Thanks for your patience! So please go ahead with this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30004:
URL: https://github.com/apache/spark/pull/30004#issuecomment-711283592


   **[Test build #129962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129962/testReport)** for PR 30004 at commit [`fe693eb`](https://github.com/apache/spark/commit/fe693eb2d7c22d2be1344214e3d3f073be643eb4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hiboyang closed pull request #30004: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

Posted by GitBox <gi...@apache.org>.
hiboyang closed pull request #30004:
URL: https://github.com/apache/spark/pull/30004


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org