You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/05 18:39:37 UTC

[GitHub] [spark] holdenk opened a new pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

holdenk opened a new pull request #31493:
URL: https://github.com/apache/spark/pull/31493


   ### What changes were proposed in this pull request?
   
   Allow users to configure a maximum amount of shuffle blocks to be stored and reject remote shuffle blocks when this threshold is exceeded.
   
   ### Why are the changes needed?
   
   In disk constrained environments with large amount of shuffle data, migrations may result in excessive disk pressure on the nodes. On Kube nodes this can result in cascading failures when combined with `emptyDir`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, new configuration parameter.
   
   ### How was this patch tested?
   
   New unit tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-776129436


   K8s failure is unrelated.
   If there are no more comments I'll merge this soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572165992



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")

Review comment:
       I agree, I would think this should be close to spark.storage.decommission.shuffleBlocks.maxThreads




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775374134


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39625/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774336759


   **[Test build #134946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134946/testReport)** for PR 31493 at commit [`233ec5a`](https://github.com/apache/spark/commit/233ec5a9b76d9471463f595c2aa9064ff76cb325).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775497614


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39629/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775470550


   **[Test build #135046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135046/testReport)** for PR 31493 at commit [`69b2540`](https://github.com/apache/spark/commit/69b25406b9d5ee1f0862f06614f579a4ed9a1e03).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775514042


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39629/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572242639



##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1932,22 +1932,29 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
     assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  test("test migration of shuffle blocks during decommissioning") {
-    val shuffleManager1 = makeSortShuffleManager()
+  private def testShuffleBlockDecommissioning(maxShuffle: Option[Int], migrate: Boolean) = {
+    maxShuffle.foreach{ b =>
+      conf.set(STORAGE_REMOTE_SHUFFLE_MAX_DISK.key, s"${b}b")

Review comment:
       Sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774336759


   **[Test build #134946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134946/testReport)** for PR 31493 at commit [`233ec5a`](https://github.com/apache/spark/commit/233ec5a9b76d9471463f595c2aa9064ff76cb325).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774371375


   **[Test build #134946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134946/testReport)** for PR 31493 at commit [`233ec5a`](https://github.com/apache/spark/commit/233ec5a9b76d9471463f595c2aa9064ff76cb325).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774296346


   **[Test build #134942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134942/testReport)** for PR 31493 at commit [`54e0bce`](https://github.com/apache/spark/commit/54e0bce5e798a9d98c779f7dc775d37c1ab2a425).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572162136



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =

Review comment:
       this needs to be added to documentation configuration.md




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775355318


   **[Test build #135042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135042/testReport)** for PR 31493 at commit [`77f970d`](https://github.com/apache/spark/commit/77f970d72d09335d0d92a9dadf24e03693a60b58).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774367635


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39529/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775555757


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135046/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774293508


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134937/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571436070



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
+        "shuffle blocks. Rejecting remote shuffle blocks means that an exec will not receive any " +

Review comment:
       "an exec" -> "an executor"?

##########
File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##########
@@ -72,6 +73,15 @@ private[spark] class IndexShuffleBlockResolver(
     }
   }
 
+  private def getShuffleBytesStored(): Long = {
+    val shuffleFiles: Seq[File] = getStoredShuffles().map {
+      si => getDataFile(si.shuffleId, si.mapId)
+    }
+    shuffleFiles.foldLeft[Long](0: Long) { (acc: Long, f: File) =>
+      acc + f.length()
+    }

Review comment:
       ```suggestion
       shuffleFiles.map(_.length()).sum
   ```

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
+        "shuffle blocks. Rejecting remote shuffle blocks means that an exec will not receive any " +
+        " shuffle migrations, and if there are no execs avaialble for migration then " +

Review comment:
       "no execs" -> "no executors"?

##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1932,22 +1932,29 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
     assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  test("test migration of shuffle blocks during decommissioning") {
-    val shuffleManager1 = makeSortShuffleManager()
+  private def testShuffleBlockDecommissioning(maxShuffle: Option[Int], migrate: Boolean) = {
+    maxShuffle.foreach{ b =>
+      conf.set(STORAGE_REMOTE_SHUFFLE_MAX_DISK.key, s"${b}b")

Review comment:
       Probably rename `maxShuffle` to `maxShuffleDiskSize` and rename `b` to `diskSize`

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
+        "shuffle blocks. Rejecting remote shuffle blocks means that an exec will not receive any " +
+        " shuffle migrations, and if there are no execs avaialble for migration then " +
+        s"decommissioning will block unless ${STORAGE_DECOMMISSION_FALLBACK_STORAGE_PATH.key} " +

Review comment:
       Will block? I think decommission will stop soon rather than block in that case.

##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1961,20 +1968,37 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
 
       decomManager.refreshOffloadingShuffleBlocks()
 
-      eventually(timeout(1.second), interval(10.milliseconds)) {
-        assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+      if (migrate) {
+        eventually(timeout(1.second), interval(10.milliseconds)) {
+          assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+        }
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleData).toPath())
+          === shuffleDataBlockContent)
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleIndex).toPath())
+          === shuffleIndexBlockContent)
+      } else {
+        Thread.sleep(1000)
+        assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm1.blockManagerId)
       }
-      assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleData).toPath())
-        === shuffleDataBlockContent)
-      assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleIndex).toPath())
-        === shuffleIndexBlockContent)
     } finally {
       mapOutputTracker.unregisterShuffle(0)
       // Avoid thread leak
       decomManager.stopOffloadingShuffleBlocks()
     }
   }
 
+  test("test migration of shuffle blocks during decommissioning - no limit") {
+    testShuffleBlockDecommissioning(None, true)
+  }
+
+  test("test migration of shuffle blocks during decommissioning - larger limit") {
+    testShuffleBlockDecommissioning(Some(10000), true)
+  }
+
+  test("test migration of shuffle blocks during decommissioning - small limit") {

Review comment:
       attach the JIRA ID?

##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1932,22 +1932,29 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
     assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  test("test migration of shuffle blocks during decommissioning") {
-    val shuffleManager1 = makeSortShuffleManager()
+  private def testShuffleBlockDecommissioning(maxShuffle: Option[Int], migrate: Boolean) = {

Review comment:
       Maybe rename "migrate" to "willReject"?
   
   All cases actually will try to migrate the blocks first. The difference is whether it'll be rejected or not.

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
+        "shuffle blocks. Rejecting remote shuffle blocks means that an exec will not receive any " +
+        " shuffle migrations, and if there are no execs avaialble for migration then " +

Review comment:
       redundant whitespace at the prefix

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")

Review comment:
       "*. maxDisk" -> "*.maxDiskSize"?

##########
File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##########
@@ -173,6 +183,13 @@ private[spark] class IndexShuffleBlockResolver(
    */
   override def putShuffleBlockAsStream(blockId: BlockId, serializerManager: SerializerManager):
       StreamCallbackWithID = {
+    // Throw an exception if we have exceeded maximum shuffle files stored
+    remoteShuffleMaxDisk.foreach { maxBytes =>
+      val bytesUsed = getShuffleBytesStored()
+      if (maxBytes < bytesUsed) {
+        throw new SparkException(s"Not storing remote shuffles $bytesUsed exceeds $maxBytes")

Review comment:
       Give a hint of the conf for users to increase the maxBytes?

##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1961,20 +1968,37 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
 
       decomManager.refreshOffloadingShuffleBlocks()
 
-      eventually(timeout(1.second), interval(10.milliseconds)) {
-        assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+      if (migrate) {
+        eventually(timeout(1.second), interval(10.milliseconds)) {
+          assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+        }
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleData).toPath())
+          === shuffleDataBlockContent)
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleIndex).toPath())
+          === shuffleIndexBlockContent)
+      } else {
+        Thread.sleep(1000)

Review comment:
       I think this can not ensure that the block fails due to the disk size limitation. Block migration may doesn't finish within 1 second.

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
+        "shuffle blocks. Rejecting remote shuffle blocks means that an exec will not receive any " +
+        " shuffle migrations, and if there are no execs avaialble for migration then " +

Review comment:
       typo: "avaialble" -> "available"

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")

Review comment:
       I think we should put the conf under "decommission" namespace as it only used for decommission.

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
+        "shuffle blocks. Rejecting remote shuffle blocks means that an exec will not receive any " +
+        " shuffle migrations, and if there are no execs avaialble for migration then " +
+        s"decommissioning will block unless ${STORAGE_DECOMMISSION_FALLBACK_STORAGE_PATH.key} " +

Review comment:
       I actually don't understand why the sentence "if there are no execs available ...." appended here. I think it's not necessarily needed here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774275909


   > In general, this mitigation cannot avoid the disk full situation fundamentally. Can we choose the migration target in a better way instead of this rejecting because this is due to the blind migration target selection?
   
   Yeah so this is a two part JIRA. The first part allows users to configure a maximum amount of shuffle blocks to store before rejecting remote shuffle blocks, and then longer term https://issues.apache.org/jira/browse/SPARK-34364 proposes to allow users to configure a maximum disk pressure threshold. The primary reason I split this into two is that on K8s, measuring disk pressure is pretty complicated and I haven't been able to think of a good solution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774336114


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39525/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572245307



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =

Review comment:
       So the theory behind that is that these are mostly like provider level configuration, e.g. we wouldn't expect an end user to set this but someone like Cloudera or Amazon would configure these for their deployments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774336114


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39525/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775537415


   **[Test build #135046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135046/testReport)** for PR 31493 at commit [`69b2540`](https://github.com/apache/spark/commit/69b25406b9d5ee1f0862f06614f579a4ed9a1e03).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571183956



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -420,6 +420,14 @@ package object config {
       .intConf
       .createWithDefault(1)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +

Review comment:
       Could you add some description about what happens when we reject the remote shuffle blocks?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775156080


   The idea LGTM. I may not have time to look into the details though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775528864


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39629/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571235009



##########
File path: core/src/main/scala/org/apache/spark/shuffle/MigratableResolver.scala
##########
@@ -45,4 +46,5 @@ trait MigratableResolver {
    * Get the blocks for migration for a particular shuffle and map.
    */
   def getMigrationBlocks(shuffleBlockInfo: ShuffleBlockInfo): List[(BlockId, ManagedBuffer)]
+

Review comment:
       sure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775457133


   **[Test build #135042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135042/testReport)** for PR 31493 at commit [`77f970d`](https://github.com/apache/spark/commit/77f970d72d09335d0d92a9dadf24e03693a60b58).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774352078


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39529/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774293065


   **[Test build #134937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134937/testReport)** for PR 31493 at commit [`49f5faa`](https://github.com/apache/spark/commit/49f5faa05ee3a1e6d02130bfacc6092d68dc83a1).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775468806


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135042/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-776140962


   Actually, I think I'll just merge as is. If we decide to change the documentation we can do that in a follow up.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774379654


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134946/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775470550


   **[Test build #135046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135046/testReport)** for PR 31493 at commit [`69b2540`](https://github.com/apache/spark/commit/69b25406b9d5ee1f0862f06614f579a4ed9a1e03).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775380295


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39625/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774338270


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134942/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572241152



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +
+        "shuffle blocks. Rejecting remote shuffle blocks means that an exec will not receive any " +

Review comment:
       Sure :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774338270


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134942/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775468806


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135042/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571239783



##########
File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##########
@@ -173,6 +183,13 @@ private[spark] class IndexShuffleBlockResolver(
    */
   override def putShuffleBlockAsStream(blockId: BlockId, serializerManager: SerializerManager):
       StreamCallbackWithID = {
+    // Throw an exception if we have exceeded maximum shuffle files stored
+    remoteShuffleMaxDisk.foreach { maxBytes =>
+      val bytesUsed = getShuffleBytesStored()

Review comment:
       That's a good question. So looking at benchmarks folks have done for file size takes a few microseconds per file (and speeds up on subsequent runs due to cache). It doesn't need to actually read the file to determine the size (if it did that would be much slower).

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -420,6 +420,14 @@ package object config {
       .intConf
       .createWithDefault(1)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")
+      .doc("Maximum disk space to use to store shuffle blocks before rejecting remote " +

Review comment:
       Sure, good idea :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774337794


   **[Test build #134942 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134942/testReport)** for PR 31493 at commit [`54e0bce`](https://github.com/apache/spark/commit/54e0bce5e798a9d98c779f7dc775d37c1ab2a425).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774367635


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39529/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774318369


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39525/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572241508



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =
+    ConfigBuilder("spark.storage.remote.shuffle.maxDisk")

Review comment:
       yeah I can rename it :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774606427


   cc @tgravescs and @mridulm too FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774296346


   **[Test build #134942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134942/testReport)** for PR 31493 at commit [`54e0bce`](https://github.com/apache/spark/commit/54e0bce5e798a9d98c779f7dc775d37c1ab2a425).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774288817


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39520/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571186381



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -684,6 +684,7 @@ private[spark] class BlockManager(
     if (blockId.isShuffle) {
       logDebug(s"Putting shuffle block ${blockId}")
       try {
+

Review comment:
       Shall we avoid touching irrelevant file?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571185893



##########
File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##########
@@ -173,6 +183,13 @@ private[spark] class IndexShuffleBlockResolver(
    */
   override def putShuffleBlockAsStream(blockId: BlockId, serializerManager: SerializerManager):
       StreamCallbackWithID = {
+    // Throw an exception if we have exceeded maximum shuffle files stored
+    remoteShuffleMaxDisk.foreach { maxBytes =>
+      val bytesUsed = getShuffleBytesStored()

Review comment:
       This looks like a performance degradation a little due to the additional disk IO. Do we need this at every `putShuffleBlockAsStream`? Do you know how much does this affected, @holdenk ?
   
   cc @dbtsai 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774306987


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39525/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572164052



##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -488,6 +488,17 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val STORAGE_REMOTE_SHUFFLE_MAX_DISK =

Review comment:
       actually I don't see any of these in docs?  spark.storage.decommission.shuffleBlocks.enabled. sorry if I already asked this and we meant to , but these are not internal configs




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774316948


   I think the Kafka failures are unrelated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775555757


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135046/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-776142652


   Merged to the current dev branch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774379654


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134946/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775376682


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39625/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774238272


   **[Test build #134937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134937/testReport)** for PR 31493 at commit [`49f5faa`](https://github.com/apache/spark/commit/49f5faa05ee3a1e6d02130bfacc6092d68dc83a1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571186953



##########
File path: core/src/main/scala/org/apache/spark/shuffle/MigratableResolver.scala
##########
@@ -45,4 +46,5 @@ trait MigratableResolver {
    * Get the blocks for migration for a particular shuffle and map.
    */
   def getMigrationBlocks(shuffleBlockInfo: ShuffleBlockInfo): List[(BlockId, ManagedBuffer)]
+

Review comment:
       Shall we remove this addition?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572244065



##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1932,22 +1932,29 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
     assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  test("test migration of shuffle blocks during decommissioning") {
-    val shuffleManager1 = makeSortShuffleManager()
+  private def testShuffleBlockDecommissioning(maxShuffle: Option[Int], migrate: Boolean) = {

Review comment:
       Sure

##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1961,20 +1968,37 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
 
       decomManager.refreshOffloadingShuffleBlocks()
 
-      eventually(timeout(1.second), interval(10.milliseconds)) {
-        assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+      if (migrate) {
+        eventually(timeout(1.second), interval(10.milliseconds)) {
+          assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+        }
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleData).toPath())
+          === shuffleDataBlockContent)
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleIndex).toPath())
+          === shuffleIndexBlockContent)
+      } else {
+        Thread.sleep(1000)
+        assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm1.blockManagerId)
       }
-      assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleData).toPath())
-        === shuffleDataBlockContent)
-      assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleIndex).toPath())
-        === shuffleIndexBlockContent)
     } finally {
       mapOutputTracker.unregisterShuffle(0)
       // Avoid thread leak
       decomManager.stopOffloadingShuffleBlocks()
     }
   }
 
+  test("test migration of shuffle blocks during decommissioning - no limit") {
+    testShuffleBlockDecommissioning(None, true)
+  }
+
+  test("test migration of shuffle blocks during decommissioning - larger limit") {
+    testShuffleBlockDecommissioning(Some(10000), true)
+  }
+
+  test("test migration of shuffle blocks during decommissioning - small limit") {

Review comment:
       yeah good idea :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774293508


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134937/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774238272


   **[Test build #134937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134937/testReport)** for PR 31493 at commit [`49f5faa`](https://github.com/apache/spark/commit/49f5faa05ee3a1e6d02130bfacc6092d68dc83a1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774359284


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39529/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572243271



##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##########
@@ -1961,20 +1968,37 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
 
       decomManager.refreshOffloadingShuffleBlocks()
 
-      eventually(timeout(1.second), interval(10.milliseconds)) {
-        assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+      if (migrate) {
+        eventually(timeout(1.second), interval(10.milliseconds)) {
+          assert(mapOutputTracker.shuffleStatuses(0).mapStatuses(0).location === bm2.blockManagerId)
+        }
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleData).toPath())
+          === shuffleDataBlockContent)
+        assert(Files.readAllBytes(bm2.diskBlockManager.getFile(shuffleIndex).toPath())
+          === shuffleIndexBlockContent)
+      } else {
+        Thread.sleep(1000)

Review comment:
       Yeah it does, in this situation we assert above in the normal case that the blocks migrate within one second.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774288817


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39520/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774258007


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39520/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774215330


   cc @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775528864


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39629/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r571236839



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -684,6 +684,7 @@ private[spark] class BlockManager(
     if (blockId.isShuffle) {
       logDebug(s"Putting shuffle block ${blockId}")
       try {
+

Review comment:
       Sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775355318


   **[Test build #135042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135042/testReport)** for PR 31493 at commit [`77f970d`](https://github.com/apache/spark/commit/77f970d72d09335d0d92a9dadf24e03693a60b58).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #31493:
URL: https://github.com/apache/spark/pull/31493#discussion_r572242244



##########
File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##########
@@ -173,6 +183,13 @@ private[spark] class IndexShuffleBlockResolver(
    */
   override def putShuffleBlockAsStream(blockId: BlockId, serializerManager: SerializerManager):
       StreamCallbackWithID = {
+    // Throw an exception if we have exceeded maximum shuffle files stored
+    remoteShuffleMaxDisk.foreach { maxBytes =>
+      val bytesUsed = getShuffleBytesStored()
+      if (maxBytes < bytesUsed) {
+        throw new SparkException(s"Not storing remote shuffles $bytesUsed exceeds $maxBytes")

Review comment:
       So we wouldn't want them to actually increase maxBytes, this is really an internal error that the user won't see because we'll just eat the error and not migrate to this exec.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-774266900


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39520/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #31493:
URL: https://github.com/apache/spark/pull/31493


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31493: [SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31493:
URL: https://github.com/apache/spark/pull/31493#issuecomment-775380295


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39625/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org