You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "Stove-hust (via GitHub)" <gi...@apache.org> on 2023/03/14 04:30:34 UTC

[GitHub] [spark] Stove-hust commented on pull request #40393: []SPARK-40082]

Stove-hust commented on PR #40393:
URL: https://github.com/apache/spark/pull/40393#issuecomment-1467339346

   > @Stove-hust Thank you for reporting and the patch. Would you be able to share driver logs?
   
   sure.
   `# stage 10 faield 
   22/10/15 10:55:58 WARN task-result-getter-1 TaskSetManager: Lost task 435.1 in stage 10.0 (TID 6822, zw02-data-hdp-dn21102.mt, executor 101): FetchFailed(null, shuffleId=3, mapIndex=-1, mapId=-1, reduceId=435, message=
   org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 3 partition 435
   22/10/15 10:55:58 INFO dag-scheduler-event-loop DAGScheduler: ShuffleMapStage 10 (processCmd at CliDriver.java:386) failed in 601.792 s due to org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 3 partition 435
   
   # resubmit stage 10 && parentStage 9
   22/10/15 10:55:58 INFO dag-scheduler-event-loop DAGScheduler: Resubmitting ShuffleMapStage 9 (processCmd at CliDriver.java:386) and ShuffleMapStage 10 (processCmd at CliDriver.java:386) due to fetch failure
   22/10/15 10:55:58 INFO dag-scheduler-event-loop DAGScheduler: Resubmitting failed stages
   22/10/15 10:55:58 INFO dag-scheduler-event-loop DAGScheduler: Submitting ShuffleMapStage 9 (MapPartitionsRDD[22] at processCmd at CliDriver.java:386), which has no missing parents
   22/10/15 10:55:58 INFO dag-scheduler-event-loop DAGScheduler: Push-based shuffle disabled for ShuffleMapStage 9 (processCmd at CliDriver.java:386) since it is already shuffle merge finalized
   22/10/15 10:55:58 INFO dag-scheduler-event-loop DAGScheduler: Submitting 3 missing tasks from ShuffleMapStage 9 (MapPartitionsRDD[22] at processCmd at CliDriver.java:386) (first 15 tasks are for partitions Vector(98, 372, 690))
   22/10/15 10:55:58 INFO dag-scheduler-event-loop YarnClusterScheduler: Adding task set 9.1 with 3 tasks
   
   # The first stage10 task completes one after another, and notifyDriverAboutPushCompletion to end stage 10, and mark finalizeTask, because the stage is not in runningStages, so the stage cannot be marked shuffleMergeFinalized.
   22/10/15 10:55:58 INFO task-result-getter-0 TaskSetManager: Finished task 325.0 in stage 10.0 (TID 6166) in 154455 ms on zw02-data-hdp-dn25537.mt (executor 117) (494/500)
   22/10/15 10:55:59 WARN task-result-getter-1 TaskSetManager: Lost task 325.1 in stage 10.0 (TID 6671, zw02-data-hdp-dn23160.mt, executor 47): TaskKilled (another attempt succeeded)
   22/10/15 10:56:20 WARN task-result-getter-1 TaskSetManager: Lost task 358.1 in stage 10.0 (TID 6731, zw02-data-hdp-dn25537.mt, executor 95): TaskKilled (another attempt succeeded)
   22/10/15 10:56:20 INFO task-result-getter-1 TaskSetManager: Task 358.1 in stage 10.0 (TID 6731) failed, but the task will not be re-executed (either because the task failed with a shuffle data fetch failure, so the previous stage needs to be re-run, or because a different copy of the task has already succeeded).
   
   # Removed TaskSet 10.0, whose tasks have all completed
   22/10/15 10:56:22 INFO task-result-getter-1 TaskSetManager: Ignoring task-finished event for 435.0 in stage 10.0 because task 435 has already completed successfully
   22/10/15 10:56:22 INFO task-result-getter-1 YarnClusterScheduler: Removed TaskSet 10.0, whose tasks have all completed, from pool 
   
   # notifyDriverAboutPushCompletion stage 10
   22/10/15 10:56:23 INFO dag-scheduler-event-loop DAGScheduler: ShuffleMapStage 10 (processCmd at CliDriver.java:386) scheduled for finalizing shuffle merge in 0 s
   22/10/15 10:56:23 INFO shuffle-merge-finalizer-2 DAGScheduler: ShuffleMapStage 10 (processCmd at CliDriver.java:386) finalizing the shuffle merge with registering merge results set to true
   
   # stage 9 finished 
   22/10/15 10:57:51 INFO task-result-getter-1 TaskSetManager: Finished task 2.0 in stage 9.1 (TID 6825) in 112825 ms on zw02-data-hdp-dn25559.mt (executor 74) (3/3)
   22/10/15 10:57:51 INFO task-result-getter-1 YarnClusterScheduler: Removed TaskSet 9.1, whose tasks have all completed, from pool 
   22/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: ShuffleMapStage 9 (processCmd at CliDriver.java:386) finished in 112.832 s
   
   # resubmit stage 10
   2/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: looking for newly runnable stages
   22/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: running: Set(ShuffleMapStage 11, ShuffleMapStage 8)
   22/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: waiting: Set(ShuffleMapStage 12, ShuffleMapStage 10)
   22/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: failed: Set()
   22/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: Submitting ShuffleMapStage 10 (MapPartitionsRDD[36] at processCmd at CliDriver.java:386), which has no missing parents
   22/10/15 10:57:51 INFO dag-scheduler-event-loop OutputCommitCoordinator: Reusing state from previous attempt of stage 10.
   22/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: Shuffle merge enabled before starting the stage for ShuffleMapStage 10 with shuffle 7 and shuffle merge 0 with 108 merger locations
   22/10/15 10:57:51 INFO dag-scheduler-event-loop DAGScheduler: Submitting 4 missing tasks from ShuffleMapStage 10 (MapPartitionsRDD[36] at processCmd at CliDriver.java:386) (first 15 tasks are for partitions Vector(105, 288, 447, 481))
   22/10/15 10:57:51 INFO dag-scheduler-event-loop YarnClusterScheduler: Adding task set 10.1 with 4 tasks
   
   # stage 10 can not finished
   22/10/15 10:58:18 INFO task-result-getter-1 TaskSetManager: Finished task 2.0 in stage 10.1 (TID 6857) in 26644 ms on zw02-data-hdp-dn23767.mt (executor 139) (1/4)
   22/10/15 10:58:24 INFO task-result-getter-1 TaskSetManager: Finished task 3.0 in stage 10.1 (TID 6860) in 32551 ms on zw02-data-hdp-dn23729.mt (executor 42) (2/4)
   22/10/15 10:58:47 INFO task-result-getter-1 TaskSetManager: Finished task 0.0 in stage 10.1 (TID 6858) in 55524 ms on zw02-data-hdp-dn20640.mt (executor 134) (3/4)
   22/10/15 10:58:58 INFO task-result-getter-0 TaskSetManager: Finished task 1.0 in stage 10.1 (TID 6859) in 66911 ms on zw02-data-hdp-dn25862.mt (executor 57) (4/4)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org