You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vipul Pandey <vi...@gmail.com> on 2014/02/26 21:36:55 UTC

failed task running in a loop

Version : 0.9
Cluster size : 20 nodes 
ExecutorMem : 100G 


My job failed due to some protobuf issues. On spark shell, by mistake, I launched the next task "saveAsTextFile" and this is what I see. There's 1 task, which should fail, but it keeps going and launching a replacement task. My shell is showing a stream of :

14/02/26 12:34:57 INFO scheduler.DAGScheduler: Marking Stage 3 (saveAsTextFile at <console>:53) for resubmision due to a fetch failure
14/02/26 12:34:57 INFO scheduler.DAGScheduler: The failed fetch was from Stage 4 (reduceByKey at <console>:50); marking it for resubmission
14/02/26 12:34:57 INFO scheduler.DAGScheduler: Resubmitting failed stages
14/02/26 12:34:57 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[18] at saveAsTextFile at <console>:53), which has no missing parents
14/02/26 12:34:57 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 3 (MappedRDD[18] at saveAsTextFile at <console>:53)
14/02/26 12:34:57 INFO scheduler.TaskSchedulerImpl: Adding task set 3.1286 with 1 tasks
14/02/26 12:34:57 INFO scheduler.TaskSetManager: Starting task 3.1286:0 as TID 21759 on executor 2: rd17d01ls-geo0287.rd.geo.apple.com (PROCESS_LOCAL)
14/02/26 12:34:57 INFO scheduler.TaskSetManager: Serialized task 3.1286:0 as 13124 bytes in 0 ms
14/02/26 12:34:57 WARN scheduler.TaskSetManager: Lost TID 21759 (task 3.1286:0)
14/02/26 12:34:57 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null
14/02/26 12:34:57 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 3.1286 from pool 
14/02/26 12:34:57 INFO scheduler.DAGScheduler: Marking Stage 3 (saveAsTextFile at <console>:53) for resubmision due to a fetch failure
14/02/26 12:34:57 INFO scheduler.DAGScheduler: The failed fetch was from Stage 4 (reduceByKey at <console>:50); marking it for resubmission
14/02/26 12:34:57 INFO scheduler.DAGScheduler: Resubmitting failed stages
14/02/26 12:34:57 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[18] at saveAsTextFile at <console>:53), which has no missing parents


and so on.


Any clue what's happening?

Thanks
Vipul