You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/21 17:29:49 UTC

[GitHub] [spark] sarutak commented on a change in pull request #25487: [SPARK-28769][CORE] Improve warning message of BarrierExecutionMode when required slots > maximum slots

sarutak commented on a change in pull request #25487: [SPARK-28769][CORE] Improve warning message of BarrierExecutionMode when required slots > maximum slots
URL: https://github.com/apache/spark/pull/25487#discussion_r316308139
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
 ##########
 @@ -981,11 +983,20 @@ private[spark] class DAGScheduler(
       finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite)
     } catch {
       case e: BarrierJobSlotsNumberCheckFailed =>
-        logWarning(s"The job $jobId requires to run a barrier stage that requires more slots " +
-          "than the total number of slots in the cluster currently.")
         // If jobId doesn't exist in the map, Scala coverts its value null to 0: Int automatically.
         val numCheckFailures = barrierJobIdToNumTasksCheckFailures.compute(jobId,
           (_: Int, value: Int) => value + 1)
+        val retryCount = numCheckFailures - 1
+        val retryMessage = if (retryCount == 0) {
+          ""
+        } else {
+          s" (Retry ${retryCount}/$maxFailureNumTasksCheck failed)"
+        }
+
+        logWarning(s"The job $jobId requires to run a barrier stage " +
 
 Review comment:
   My first idea was like what you suggests but If we so, we get following messages.
   ```
   19/08/22 01:48:34 WARN DAGScheduler: Barrier stage in job 0  requires 3 slots, but only 2 are available. Failure 1 / 3
   19/08/22 01:48:49 WARN DAGScheduler: Barrier stage in job 0  requires 3 slots, but only 2 are available. Failure 2 / 3
   19/08/22 01:49:04 WARN DAGScheduler: Barrier stage in job 0  requires 3 slots, but only 2 are available. Failure 3 / 3
   19/08/22 01:49:19 WARN DAGScheduler: Barrier stage in job 0  requires 3 slots, but only 2 are available. Failure 4 / 3
   ```
   `Failure 4 / 3` looks weird. Another solution would using `maxFailureNumTaskCheck + 1` as the number of maximum attempt.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org