You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/25 14:07:34 UTC

[GitHub] [spark] yikf opened a new pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

yikf opened a new pull request #31648:
URL: https://github.com/apache/spark/pull/31648


   ### What changes were proposed in this pull request?
   Fixed an issue where data could not be cleaned up when unregisterShuffle.
   
   
   ### Why are the changes needed?
   While we use the old shuffle fetch protocol, we use partitionId as mapId in the ShuffleBlockId construction,but we use `context.taskAttemptId()` as mapId that it is cached in `taskIdMapsForShuffle` when we `getWriter[K, V]`.
   
   where data could not be cleaned up when unregisterShuffle ,because we remove a shuffle's metadata from the `taskIdMapsForShuffle`'s mapIds, the mapId is `context.taskAttemptId()` instead of partitionId.
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   
   ### How was this patch tested?
   exist test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-785934560


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf commented on pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786568485


   open a PR [https://github.com/apache/spark/pull/31664](url) against master branch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf commented on pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786334725


   > The existing tests are not catching this issue - can you add something to ensure we test for this problem ?
   
   i will add the UT for this problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #31648:
URL: https://github.com/apache/spark/pull/31648#discussion_r582890410



##########
File path: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
##########
@@ -154,7 +154,7 @@ private[spark] class SortShuffleManager(conf: SparkConf) extends ShuffleManager
       metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = {
     val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent(
       handle.shuffleId, _ => new OpenHashSet[Long](16))
-    mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }

Review comment:
       @xuanyuanking do you have an opinion here? I think you wrote this piece of the code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786014481






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] otterc commented on a change in pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
otterc commented on a change in pull request #31648:
URL: https://github.com/apache/spark/pull/31648#discussion_r583290760



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
##########
@@ -28,7 +28,7 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
   test("concat") {
     def testConcat(inputs: String*): Unit = {
       val expected = if (inputs.contains(null)) null else inputs.mkString
-      checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow)
+      checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected)

Review comment:
       Is this test change related to the code change? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf removed a comment on pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf removed a comment on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786337598


   > @yikf, can you open a PR against master branch?
   
   ok, should i close the currently PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf commented on a change in pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf commented on a change in pull request #31648:
URL: https://github.com/apache/spark/pull/31648#discussion_r582866333



##########
File path: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
##########
@@ -154,7 +154,7 @@ private[spark] class SortShuffleManager(conf: SparkConf) extends ShuffleManager
       metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = {
     val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent(
       handle.shuffleId, _ => new OpenHashSet[Long](16))
-    mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }

Review comment:
       For more convenient review:
   
   ShuffleMapTask#runTask
   `val mapId = if (SparkEnv.get.conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)) {
         partitionId
       } else context.taskAttemptId()`
   
   SortShuffleManager#getWriter
   `val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent(
         handle.shuffleId, _ => new OpenHashSet[Long](16))
       mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }`
   
   SortShuffleManager#unregisterShuffle
   `Option(taskIdMapsForShuffle.remove(shuffleId)).foreach { mapTaskIds =>
         mapTaskIds.iterator.foreach { mapTaskId =>
           shuffleBlockResolver.removeDataByMap(shuffleId, mapTaskId)
         }
       }`
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf commented on a change in pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf commented on a change in pull request #31648:
URL: https://github.com/apache/spark/pull/31648#discussion_r582860612



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
##########
@@ -28,7 +28,7 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
   test("concat") {
     def testConcat(inputs: String*): Unit = {
       val expected = if (inputs.contains(null)) null else inputs.mkString
-      checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow)
+      checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected)

Review comment:
       For reviewer : clean up unused code for code simplifications.
   Method signature for checkEvaluation as follow:
   checkEvaluation(expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786336605


   @yikf, can you open a PR against master branch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf edited a comment on pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf edited a comment on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786568485


   open a PR https://github.com/apache/spark/pull/31664 against master branch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf edited a comment on pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf edited a comment on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786568485


   open a PR https://github.com/apache/spark/pull/31664 against master branch, close this PR for branch 3.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf commented on pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786334052


   > The change in `SortShuffleManager` looks good to me. However, we should add a UT to cover this. Also, I don't quite get how the change in `StringExpressionsSuite.scala` is related to this bug fix.
   
   The change in  `StringExpressionsSuite.scala` is unrelated to this bug fix, for code simplifications only.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #31648:
URL: https://github.com/apache/spark/pull/31648#discussion_r585646633



##########
File path: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala
##########
@@ -154,7 +154,7 @@ private[spark] class SortShuffleManager(conf: SparkConf) extends ShuffleManager
       metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = {
     val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent(
       handle.shuffleId, _ => new OpenHashSet[Long](16))
-    mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }

Review comment:
       Thanks for pinging me, I'm reviewing #31664




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf commented on pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786337598


   > @yikf, can you open a PR against master branch?
   
   ok, should i close the currently PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf closed pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf closed pull request #31648:
URL: https://github.com/apache/spark/pull/31648


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yikf commented on pull request #31648: [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle.

Posted by GitBox <gi...@apache.org>.
yikf commented on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-785929222


   gentle ping @srowen , thanks for taking a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org