You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/15 04:09:37 UTC

[GitHub] [spark] holdenk opened a new pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

holdenk opened a new pull request #30046:
URL: https://github.com/apache/spark/pull/30046


   ### What changes were proposed in this pull request?
   
   If a block is removed between discovery to transfer fo the block, we short circuit that block and remove it from the list to transfer and increment the transferred blocks. This is complicated since both RPC errors and local read errors may be reported with the same exception class.
   
   ### Why are the changes needed?
   
   Slow shuffle refreshes could waste time when decommissioning has already finished. Decommissioning might avoid transferring some some blocks to an otherwise live host which is marked as "full" if a deleted block fails to transfer to that host.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit and integration tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505743850



##########
File path: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
##########
@@ -54,9 +54,84 @@ private[spark] trait DecommissionSuite { k8sSuite: KubernetesSuite =>
       executorPatience = None,
       decommissioningTest = true)
   }
+
+  test("Test basic decommissioning with shuffle cleanup", k8sTestTag) {
+    sparkAppConf
+      .set(config.DECOMMISSION_ENABLED.key, "true")
+      .set("spark.kubernetes.pyspark.pythonVersion", "3")

Review comment:
       Sounds good :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709542981


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34457/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708921257


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34404/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709556973


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505837943



##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionUnitSuite.scala
##########
@@ -161,7 +167,80 @@ class BlockManagerDecommissionUnitSuite extends SparkFunSuite with Matchers {
       .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
 
     // Verify the decom manager handles this correctly
-    validateDecommissionTimestamps(sparkConf, bm, migratableShuffleBlockResolver)
+    validateDecommissionTimestamps(sparkConf, bm)
+  }
+
+  test("block decom manager does not re-add removed shuffle files") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    registerShuffleBlocks(migratableShuffleBlockResolver, Set())
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    bmDecomManager.migratingShuffles += ShuffleBlockInfo(10, 10)
+
+    validateDecommissionTimestampsOnManager(bmDecomManager)
+  }
+
+  test("block decom manager handles IO failures") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    registerShuffleBlocks(migratableShuffleBlockResolver, Set((1, 1L, 1)))
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+
+    val blockTransferService = mock(classOf[BlockTransferService])
+    // Simulate an ambiguous IO error (e.g. block could be gone, connection failed, etc.)
+    when(blockTransferService.uploadBlockSync(
+      mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.isNull())).thenThrow(
+      new java.io.IOException("boop")
+    )
+
+    when(bm.blockTransferService).thenReturn(blockTransferService)
+
+    // Verify the decom manager handles this correctly
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    validateDecommissionTimestampsOnManager(bmDecomManager, fail = false)
+  }
+
+  test("block decom manager short circuits removed blocks") {

Review comment:
       This fails, too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505165348



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
##########
@@ -82,23 +83,35 @@ private[storage] class BlockManagerDecommissioner(
               Thread.sleep(SLEEP_TIME_SECS * 1000L)
             case Some((shuffleBlockInfo, retryCount)) =>
               if (retryCount < maxReplicationFailuresForDecommission) {
-                logInfo(s"Trying to migrate shuffle ${shuffleBlockInfo} to ${peer}")
-                val blocks =
-                  bm.migratableResolver.getMigrationBlocks(shuffleBlockInfo)
+                logDebug(s"Trying to migrate shuffle ${shuffleBlockInfo} to ${peer}")
+                val blocks = bm.migratableResolver.getMigrationBlocks(shuffleBlockInfo)
                 logDebug(s"Got migration sub-blocks ${blocks}")
-                blocks.foreach { case (blockId, buffer) =>
-                  logDebug(s"Migrating sub-block ${blockId}")
-                  bm.blockTransferService.uploadBlockSync(
-                    peer.host,
-                    peer.port,
-                    peer.executorId,
-                    blockId,
-                    buffer,
-                    StorageLevel.DISK_ONLY,
-                    null)// class tag, we don't need for shuffle
-                  logDebug(s"Migrated sub block ${blockId}")
+
+                // Migrate the components of the blocks.
+                try {
+                  blocks.foreach { case (blockId, buffer) =>
+                    logDebug(s"Migrating sub-block ${blockId}")
+                    bm.blockTransferService.uploadBlockSync(
+                      peer.host,
+                      peer.port,
+                      peer.executorId,
+                      blockId,
+                      buffer,
+                      StorageLevel.DISK_ONLY,
+                      null)// class tag, we don't need for shuffle
+                    logDebug(s"Migrated sub block ${blockId}")
+                  }
+                  logDebug(s"Migrated ${shuffleBlockInfo} to ${peer}")
+                } catch {
+                  case e: IOException =>
+                    // If a block got deleted before netty opened the file handle, then trying to

Review comment:
       Could you describe when this happens logically?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709716871






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709716862


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34476/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709679429


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708894232






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505166230



##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionUnitSuite.scala
##########
@@ -161,7 +167,79 @@ class BlockManagerDecommissionUnitSuite extends SparkFunSuite with Matchers {
       .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
 
     // Verify the decom manager handles this correctly
-    validateDecommissionTimestamps(sparkConf, bm, migratableShuffleBlockResolver)
+    validateDecommissionTimestamps(sparkConf, bm)
+  }
+
+  test("block decom manager does not re-add removed shuffle files") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    registerShuffleBlocks(migratableShuffleBlockResolver, Set())
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    bmDecomManager.migratingShuffles += ShuffleBlockInfo(10, 10)
+
+    validateDecommissionTimestampsOnManager(bmDecomManager)
+  }
+
+  test("block decom manager handles IO failures") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    registerShuffleBlocks(migratableShuffleBlockResolver, Set((1, 1L, 1)))
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+
+    val blockTransferService = mock(classOf[BlockTransferService])
+    // Simulate an ambiguous IO error (e.g. block could be gone, connection failed, etc.)
+    when(blockTransferService.uploadBlockSync(
+      mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.isNull())).thenThrow(
+      new java.io.IOException("boop")
+    )
+
+    when(bm.blockTransferService).thenReturn(blockTransferService)
+
+    // Verify the decom manager handles this correctly
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    validateDecommissionTimestampsOnManager(bmDecomManager, fail = false)
+  }
+
+  test("block decom manager short circuits removed blocks") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    // First call get blocks, then empty list simulating a delete.
+    when(migratableShuffleBlockResolver.getMigrationBlocks(mc.any()))
+      .thenReturn(List(
+        (ShuffleIndexBlockId(1, 1, 1), mock(classOf[ManagedBuffer])),
+        (ShuffleDataBlockId(1, 1, 1), mock(classOf[ManagedBuffer]))))
+      .thenReturn(List())
+
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+
+    val blockTransferService = mock(classOf[BlockTransferService])
+    // Simulate an ambiguous IO error (e.g. block could be gone, connection failed, etc.)
+    when(blockTransferService.uploadBlockSync(
+      mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.isNull())).thenThrow(
+      new java.io.IOException("boop")
+    )
+
+    when(bm.blockTransferService).thenReturn(blockTransferService)
+
+    // Verify the decom manager handles this correctly
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    validateDecommissionTimestampsOnManager(bmDecomManager, fail = false, numShuffles=Some(1))

Review comment:
       Sure 👍




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505165642



##########
File path: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
##########
@@ -54,9 +54,84 @@ private[spark] trait DecommissionSuite { k8sSuite: KubernetesSuite =>
       executorPatience = None,
       decommissioningTest = true)
   }
+
+  test("Test basic decommissioning with shuffle cleanup", k8sTestTag) {

Review comment:
       Thank you for adding this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709709636






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708894232


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709582881






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709679416


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34469/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708894225


   **[Test build #129796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129796/testReport)** for PR 30046 at commit [`fa19ca4`](https://github.com/apache/spark/commit/fa19ca42beac28cfdf5def84a0eb60c584274c74).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505165884



##########
File path: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
##########
@@ -54,9 +54,84 @@ private[spark] trait DecommissionSuite { k8sSuite: KubernetesSuite =>
       executorPatience = None,
       decommissioningTest = true)
   }
+
+  test("Test basic decommissioning with shuffle cleanup", k8sTestTag) {
+    sparkAppConf
+      .set(config.DECOMMISSION_ENABLED.key, "true")
+      .set("spark.kubernetes.pyspark.pythonVersion", "3")

Review comment:
       We can remove this because the default is `3` now and there is no `2` in Apache Spark 3.1.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708893106


   **[Test build #129796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129796/testReport)** for PR 30046 at commit [`fa19ca4`](https://github.com/apache/spark/commit/fa19ca42beac28cfdf5def84a0eb60c584274c74).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709700011


   **[Test build #129871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129871/testReport)** for PR 30046 at commit [`b50eea8`](https://github.com/apache/spark/commit/b50eea895a084c04784399faaf74f2b822405e84).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709659489


   **[Test build #129863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129863/testReport)** for PR 30046 at commit [`32c4580`](https://github.com/apache/spark/commit/32c4580c1ffc967f0b360471490d456889fdc01f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709700011


   **[Test build #129871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129871/testReport)** for PR 30046 at commit [`b50eea8`](https://github.com/apache/spark/commit/b50eea895a084c04784399faaf74f2b822405e84).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709556976


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34457/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on a change in pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505742266



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
##########
@@ -82,23 +83,35 @@ private[storage] class BlockManagerDecommissioner(
               Thread.sleep(SLEEP_TIME_SECS * 1000L)
             case Some((shuffleBlockInfo, retryCount)) =>
               if (retryCount < maxReplicationFailuresForDecommission) {
-                logInfo(s"Trying to migrate shuffle ${shuffleBlockInfo} to ${peer}")
-                val blocks =
-                  bm.migratableResolver.getMigrationBlocks(shuffleBlockInfo)
+                logDebug(s"Trying to migrate shuffle ${shuffleBlockInfo} to ${peer}")
+                val blocks = bm.migratableResolver.getMigrationBlocks(shuffleBlockInfo)
                 logDebug(s"Got migration sub-blocks ${blocks}")
-                blocks.foreach { case (blockId, buffer) =>
-                  logDebug(s"Migrating sub-block ${blockId}")
-                  bm.blockTransferService.uploadBlockSync(
-                    peer.host,
-                    peer.port,
-                    peer.executorId,
-                    blockId,
-                    buffer,
-                    StorageLevel.DISK_ONLY,
-                    null)// class tag, we don't need for shuffle
-                  logDebug(s"Migrated sub block ${blockId}")
+
+                // Migrate the components of the blocks.
+                try {
+                  blocks.foreach { case (blockId, buffer) =>
+                    logDebug(s"Migrating sub-block ${blockId}")
+                    bm.blockTransferService.uploadBlockSync(
+                      peer.host,
+                      peer.port,
+                      peer.executorId,
+                      blockId,
+                      buffer,
+                      StorageLevel.DISK_ONLY,
+                      null)// class tag, we don't need for shuffle
+                    logDebug(s"Migrated sub block ${blockId}")
+                  }
+                  logDebug(s"Migrated ${shuffleBlockInfo} to ${peer}")
+                } catch {
+                  case e: IOException =>
+                    // If a block got deleted before netty opened the file handle, then trying to

Review comment:
       sure :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709708800


   **[Test build #129863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129863/testReport)** for PR 30046 at commit [`32c4580`](https://github.com/apache/spark/commit/32c4580c1ffc967f0b360471490d456889fdc01f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708893106


   **[Test build #129796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129796/testReport)** for PR 30046 at commit [`fa19ca4`](https://github.com/apache/spark/commit/fa19ca42beac28cfdf5def84a0eb60c584274c74).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709582881


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709516257


   Gentle ping, @holdenk .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708933113


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708933113






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709847313






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708933118


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34404/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709582191


   **[Test build #129851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129851/testReport)** for PR 30046 at commit [`cc926b3`](https://github.com/apache/spark/commit/cc926b3f477a90fbb75e3987a8f5ed351055581f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709556960


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34457/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709659489


   **[Test build #129863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129863/testReport)** for PR 30046 at commit [`32c4580`](https://github.com/apache/spark/commit/32c4580c1ffc967f0b360471490d456889fdc01f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709709636






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #30046:
URL: https://github.com/apache/spark/pull/30046


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709847313






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709521428


   **[Test build #129851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129851/testReport)** for PR 30046 at commit [`cc926b3`](https://github.com/apache/spark/commit/cc926b3f477a90fbb75e3987a8f5ed351055581f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709556973






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709679429






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709709894


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34476/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709716871






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709845677


   **[Test build #129871 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129871/testReport)** for PR 30046 at commit [`b50eea8`](https://github.com/apache/spark/commit/b50eea895a084c04784399faaf74f2b822405e84).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30046:
URL: https://github.com/apache/spark/pull/30046#discussion_r505163764



##########
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionUnitSuite.scala
##########
@@ -161,7 +167,79 @@ class BlockManagerDecommissionUnitSuite extends SparkFunSuite with Matchers {
       .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
 
     // Verify the decom manager handles this correctly
-    validateDecommissionTimestamps(sparkConf, bm, migratableShuffleBlockResolver)
+    validateDecommissionTimestamps(sparkConf, bm)
+  }
+
+  test("block decom manager does not re-add removed shuffle files") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    registerShuffleBlocks(migratableShuffleBlockResolver, Set())
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    bmDecomManager.migratingShuffles += ShuffleBlockInfo(10, 10)
+
+    validateDecommissionTimestampsOnManager(bmDecomManager)
+  }
+
+  test("block decom manager handles IO failures") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    registerShuffleBlocks(migratableShuffleBlockResolver, Set((1, 1L, 1)))
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+
+    val blockTransferService = mock(classOf[BlockTransferService])
+    // Simulate an ambiguous IO error (e.g. block could be gone, connection failed, etc.)
+    when(blockTransferService.uploadBlockSync(
+      mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.isNull())).thenThrow(
+      new java.io.IOException("boop")
+    )
+
+    when(bm.blockTransferService).thenReturn(blockTransferService)
+
+    // Verify the decom manager handles this correctly
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    validateDecommissionTimestampsOnManager(bmDecomManager, fail = false)
+  }
+
+  test("block decom manager short circuits removed blocks") {
+    // Set up the mocks so we return one shuffle block
+    val bm = mock(classOf[BlockManager])
+    val migratableShuffleBlockResolver = mock(classOf[MigratableResolver])
+    // First call get blocks, then empty list simulating a delete.
+    when(migratableShuffleBlockResolver.getMigrationBlocks(mc.any()))
+      .thenReturn(List(
+        (ShuffleIndexBlockId(1, 1, 1), mock(classOf[ManagedBuffer])),
+        (ShuffleDataBlockId(1, 1, 1), mock(classOf[ManagedBuffer]))))
+      .thenReturn(List())
+
+    when(bm.migratableResolver).thenReturn(migratableShuffleBlockResolver)
+    when(bm.getMigratableRDDBlocks())
+      .thenReturn(Seq())
+    when(bm.getPeers(mc.any()))
+      .thenReturn(Seq(BlockManagerId("exec2", "host2", 12345)))
+
+    val blockTransferService = mock(classOf[BlockTransferService])
+    // Simulate an ambiguous IO error (e.g. block could be gone, connection failed, etc.)
+    when(blockTransferService.uploadBlockSync(
+      mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.any(), mc.isNull())).thenThrow(
+      new java.io.IOException("boop")
+    )
+
+    when(bm.blockTransferService).thenReturn(blockTransferService)
+
+    // Verify the decom manager handles this correctly
+    val bmDecomManager = new BlockManagerDecommissioner(sparkConf, bm)
+    validateDecommissionTimestampsOnManager(bmDecomManager, fail = false, numShuffles=Some(1))

Review comment:
       Could you fix the scalastyle to run CI?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708933094


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34404/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709521428


   **[Test build #129851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129851/testReport)** for PR 30046 at commit [`cc926b3`](https://github.com/apache/spark/commit/cc926b3f477a90fbb75e3987a8f5ed351055581f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
holdenk commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709727489


   R failure is unrelated. cc @dongjoon-hyun 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709679432


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34469/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709582892


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129851/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30046: [SPARK-33154] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-708894237


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129796/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30046: [SPARK-33154][CORE][K8S] Handle cleaned shuffles during migration

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30046:
URL: https://github.com/apache/spark/pull/30046#issuecomment-709671293


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34469/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org