You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ukby1234 (via GitHub)" <gi...@apache.org> on 2023/10/20 04:30:16 UTC

Re: [PR] [SPARK-44635][CORE] Handle shuffle fetch failures in decommissions [spark]

ukby1234 commented on code in PR #42296:
URL: https://github.com/apache/spark/pull/42296#discussion_r1366447643


##########
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala:
##########
@@ -355,8 +367,28 @@ final class ShuffleBlockFetcherIterator(
                 updateMergedReqsDuration(wasReqForMergedChunks = true)
                 results.put(FallbackOnPushMergedFailureResult(
                   block, address, infoMap(blockId)._1, remainingBlocks.isEmpty))
-              } else {
+              } else if (!shouldPerformShuffleLocationRefresh) {
                 results.put(FailureFetchResult(block, infoMap(blockId)._2, address, e))
+              } else {
+                val (shuffleId, mapId) = BlockId.getShuffleIdAndMapId(block)
+                val mapOutputTrackerWorker = mapOutputTracker.asInstanceOf[MapOutputTrackerWorker]
+                Try(mapOutputTrackerWorker
+                  .getMapOutputLocationWithRefresh(shuffleId, mapId, address)) match {

Review Comment:
   Refreshing map output locations in a Netty callback thread will cause potential deadlock. Here is the reason: 
   1. Some map output locations are stored via broadcast variables
   2. This code has a synchronization block
   3. The netty response to fetch broadcast variables might be blocked by other handlers like the shuffle success handler
   4. In the above case, because the shuffle success handler also requires the same lock from 2), this is a deadlock
   
   The above situation happened during my test of this code running this patch. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org