You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ritesh Kumar Singh <ri...@gmail.com> on 2014/11/10 14:21:02 UTC

Executor Lost Failure

Hi,

I am trying to submit my application using spark-submit, using following
spark-default.conf params:

spark.master                     spark://<master-ip>:7077
spark.eventLog.enabled           true
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value
-Dnumbers="one two three"

===============================================================
But every time I am getting this error:

14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local:
remote Akka client disassociated
14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
aa.local): ExecutorLostFailure (executor lost)
14/11/10 18:39:17 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
aa.local): ExecutorLostFailure (executor lost)
14/11/10 18:39:20 ERROR TaskSchedulerImpl: Lost executor 2 on aa.local:
remote Akka client disassociated
14/11/10 18:39:20 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2,
aa.local): ExecutorLostFailure (executor lost)
14/11/10 18:39:20 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3,
aa.local): ExecutorLostFailure (executor lost)
14/11/10 18:39:26 ERROR TaskSchedulerImpl: Lost executor 4 on aa.local:
remote Akka client disassociated
14/11/10 18:39:26 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5,
aa.local): ExecutorLostFailure (executor lost)
14/11/10 18:39:26 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4,
aa.local): ExecutorLostFailure (executor lost)
14/11/10 18:39:29 ERROR TaskSchedulerImpl: Lost executor 5 on aa.local:
remote Akka client disassociated
14/11/10 18:39:29 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7,
aa.local): ExecutorLostFailure (executor lost)
14/11/10 18:39:29 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times;
aborting job
14/11/10 18:39:29 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6,
aa.local): ExecutorLostFailure (executor lost)
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 0.0 (TID 7, gonephishing.local): ExecutorLostFailure
(executor lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

=================================================================
Any fixes?

Re: Fwd: Executor Lost Failure

Posted by Ritesh Kumar Singh <ri...@gmail.com>.
Yes... found the output on web UI of the slave.

Thanks :)

On Tue, Nov 11, 2014 at 2:48 AM, Ankur Dave <an...@gmail.com> wrote:

> At 2014-11-10 22:53:49 +0530, Ritesh Kumar Singh <
> riteshoneinamillion@gmail.com> wrote:
> > Tasks are now getting submitted, but many tasks don't happen.
> > Like, after opening the spark-shell, I load a text file from disk and try
> > printing its contentsas:
> >
> >>sc.textFile("/path/to/file").foreach(println)
> >
> > It does not give me any output.
>
> That's because foreach launches tasks on the slaves. When each task tries
> to print its lines, they go to the stdout file on the slave rather than to
> your console at the driver. You should see the file's contents in each of
> the slaves' stdout files in the web UI.
>
> This only happens when running on a cluster. In local mode, all the tasks
> are running locally and can output to the driver, so foreach(println) is
> more useful.
>
> Ankur
>

Re: Fwd: Executor Lost Failure

Posted by Ankur Dave <an...@gmail.com>.
At 2014-11-10 22:53:49 +0530, Ritesh Kumar Singh <ri...@gmail.com> wrote:
> Tasks are now getting submitted, but many tasks don't happen.
> Like, after opening the spark-shell, I load a text file from disk and try
> printing its contentsas:
>
>>sc.textFile("/path/to/file").foreach(println)
>
> It does not give me any output.

That's because foreach launches tasks on the slaves. When each task tries to print its lines, they go to the stdout file on the slave rather than to your console at the driver. You should see the file's contents in each of the slaves' stdout files in the web UI.

This only happens when running on a cluster. In local mode, all the tasks are running locally and can output to the driver, so foreach(println) is more useful.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Fwd: Executor Lost Failure

Posted by Ritesh Kumar Singh <ri...@gmail.com>.
---------- Forwarded message ----------
From: Ritesh Kumar Singh <ri...@gmail.com>
Date: Mon, Nov 10, 2014 at 10:52 PM
Subject: Re: Executor Lost Failure
To: Akhil Das <ak...@sigmoidanalytics.com>


Tasks are now getting submitted, but many tasks don't happen.
Like, after opening the spark-shell, I load a text file from disk and try
printing its contentsas:

>sc.textFile("/path/to/file").foreach(println)

It does not give me any output. While running this:

>sc.textFile("/path/to/file").count

gives me the right number of lines in the text file.
Not sure what the error is. But here is the output on the console for print
case:

14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with
curMem=709528, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in
memory (estimated size 210.2 KB, free 441.5 MB)
14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with
curMem=924758, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as
bytes in memory (estimated size 16.8 KB, free 441.5 MB)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB)
14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
broadcast_6_piece0
14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1
14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at <console>:13
14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at <console>:13)
with 2 output partitions (allowLocal=false)
14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at
<console>:13)
14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List()
14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List()
14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt
MappedRDD[7] at textFile at <console>:13), which has no missing parents
14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with
curMem=941997, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 2.4 KB, free 441.4 MB)
14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with
curMem=944501, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as
bytes in memory (estimated size 1602.0 B, free 441.4 MB)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB)
14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
broadcast_7_piece0
14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage
3 (Desktop/mnd.txt MappedRDD[7] at textFile at <console>:13)
14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
6, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID
7, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB)
14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
6) in 308 ms on gonephishing.local (1/2)
14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at <console>:13)
finished in 0.321 s
14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID
7) in 315 ms on gonephishing.local (2/2)
14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at <console>:13,
took 0.376602079 s
14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool

=======================================================================



On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> ​Try adding the following configurations also, might work.
>
>  spark.rdd.compress true
>
>       spark.storage.memoryFraction 1
>       spark.core.connection.ack.wait.timeout 600
>       spark.akka.frameSize 50
>
> Thanks
> Best Regards
>
> On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh <
> riteshoneinamillion@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to submit my application using spark-submit, using following
>> spark-default.conf params:
>>
>> spark.master                     spark://<master-ip>:7077
>> spark.eventLog.enabled           true
>> spark.serializer
>> org.apache.spark.serializer.KryoSerializer
>> spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value
>> -Dnumbers="one two three"
>>
>> ===============================================================
>> But every time I am getting this error:
>>
>> 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local:
>> remote Akka client disassociated
>> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
>> aa.local): ExecutorLostFailure (executor lost)
>> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
>> aa.local): ExecutorLostFailure (executor lost)
>> 14/11/10 18:39:20 ERROR TaskSchedulerImpl: Lost executor 2 on aa.local:
>> remote Akka client disassociated
>> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2,
>> aa.local): ExecutorLostFailure (executor lost)
>> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3,
>> aa.local): ExecutorLostFailure (executor lost)
>> 14/11/10 18:39:26 ERROR TaskSchedulerImpl: Lost executor 4 on aa.local:
>> remote Akka client disassociated
>> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5,
>> aa.local): ExecutorLostFailure (executor lost)
>> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4,
>> aa.local): ExecutorLostFailure (executor lost)
>> 14/11/10 18:39:29 ERROR TaskSchedulerImpl: Lost executor 5 on aa.local:
>> remote Akka client disassociated
>> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7,
>> aa.local): ExecutorLostFailure (executor lost)
>> 14/11/10 18:39:29 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4
>> times; aborting job
>> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6,
>> aa.local): ExecutorLostFailure (executor lost)
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 0.3 in stage 0.0 (TID 7, gonephishing.local):
>> ExecutorLostFailure (executor lost)
>> Driver stacktrace:
>> at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>> at scala.Option.foreach(Option.scala:236)
>> at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
>> at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> =================================================================
>> Any fixes?
>>
>
>

Re: Executor Lost Failure

Posted by Ritesh Kumar Singh <ri...@gmail.com>.
On Mon, Nov 10, 2014 at 10:52 PM, Ritesh Kumar Singh <
riteshoneinamillion@gmail.com> wrote:

> Tasks are now getting submitted, but many tasks don't happen.
> Like, after opening the spark-shell, I load a text file from disk and try
> printing its contentsas:
>
> >sc.textFile("/path/to/file").foreach(println)
>
> It does not give me any output. While running this:
>
> >sc.textFile("/path/to/file").count
>
> gives me the right number of lines in the text file.
> Not sure what the error is. But here is the output on the console for
> print case:
>
> 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with
> curMem=709528, maxMem=463837593
> 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in
> memory (estimated size 210.2 KB, free 441.5 MB)
> 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with
> curMem=924758, maxMem=463837593
> 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as
> bytes in memory (estimated size 16.8 KB, free 441.5 MB)
> 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in
> memory on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB)
> 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
> broadcast_6_piece0
> 14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1
> 14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at <console>:13
> 14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at <console>:13)
> with 2 output partitions (allowLocal=false)
> 14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at
> <console>:13)
> 14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List()
> 14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List()
> 14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt
> MappedRDD[7] at textFile at <console>:13), which has no missing parents
> 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with
> curMem=941997, maxMem=463837593
> 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in
> memory (estimated size 2.4 KB, free 441.4 MB)
> 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with
> curMem=944501, maxMem=463837593
> 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as
> bytes in memory (estimated size 1602.0 B, free 441.4 MB)
> 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in
> memory on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB)
> 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
> broadcast_7_piece0
> 14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage
> 3 (Desktop/mnd.txt MappedRDD[7] at textFile at <console>:13)
> 14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
> 14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
> 6, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
> 14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID
> 7, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
> 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in
> memory on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB)
> 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in
> memory on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB)
> 14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
> 6) in 308 ms on gonephishing.local (1/2)
> 14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at <console>:13)
> finished in 0.321 s
> 14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID
> 7) in 315 ms on gonephishing.local (2/2)
> 14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at
> <console>:13, took 0.376602079 s
> 14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
> have all completed, from pool
>
> =======================================================================
>
>
>
> On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> ​Try adding the following configurations also, might work.
>>
>>  spark.rdd.compress true
>>
>>       spark.storage.memoryFraction 1
>>       spark.core.connection.ack.wait.timeout 600
>>       spark.akka.frameSize 50
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh <
>> riteshoneinamillion@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am trying to submit my application using spark-submit, using following
>>> spark-default.conf params:
>>>
>>> spark.master                     spark://<master-ip>:7077
>>> spark.eventLog.enabled           true
>>> spark.serializer
>>> org.apache.spark.serializer.KryoSerializer
>>> spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value
>>> -Dnumbers="one two three"
>>>
>>> ===============================================================
>>> But every time I am getting this error:
>>>
>>> 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local:
>>> remote Akka client disassociated
>>> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID
>>> 1, aa.local): ExecutorLostFailure (executor lost)
>>> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID
>>> 0, aa.local): ExecutorLostFailure (executor lost)
>>> 14/11/10 18:39:20 ERROR TaskSchedulerImpl: Lost executor 2 on aa.local:
>>> remote Akka client disassociated
>>> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID
>>> 2, aa.local): ExecutorLostFailure (executor lost)
>>> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID
>>> 3, aa.local): ExecutorLostFailure (executor lost)
>>> 14/11/10 18:39:26 ERROR TaskSchedulerImpl: Lost executor 4 on aa.local:
>>> remote Akka client disassociated
>>> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID
>>> 5, aa.local): ExecutorLostFailure (executor lost)
>>> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID
>>> 4, aa.local): ExecutorLostFailure (executor lost)
>>> 14/11/10 18:39:29 ERROR TaskSchedulerImpl: Lost executor 5 on aa.local:
>>> remote Akka client disassociated
>>> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID
>>> 7, aa.local): ExecutorLostFailure (executor lost)
>>> 14/11/10 18:39:29 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4
>>> times; aborting job
>>> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID
>>> 6, aa.local): ExecutorLostFailure (executor lost)
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
>>> failure: Lost task 0.3 in stage 0.0 (TID 7, gonephishing.local):
>>> ExecutorLostFailure (executor lost)
>>> Driver stacktrace:
>>> at org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
>>> at
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>>> at scala.Option.foreach(Option.scala:236)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
>>> at
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>> =================================================================
>>> Any fixes?
>>>
>>
>>
>

Re: Executor Lost Failure

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
​Try adding the following configurations also, might work.

 spark.rdd.compress true

      spark.storage.memoryFraction 1
      spark.core.connection.ack.wait.timeout 600
      spark.akka.frameSize 50

Thanks
Best Regards

On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh <
riteshoneinamillion@gmail.com> wrote:

> Hi,
>
> I am trying to submit my application using spark-submit, using following
> spark-default.conf params:
>
> spark.master                     spark://<master-ip>:7077
> spark.eventLog.enabled           true
> spark.serializer                 org.apache.spark.serializer.KryoSerializer
> spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value
> -Dnumbers="one two three"
>
> ===============================================================
> But every time I am getting this error:
>
> 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local:
> remote Akka client disassociated
> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
> aa.local): ExecutorLostFailure (executor lost)
> 14/11/10 18:39:17 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> aa.local): ExecutorLostFailure (executor lost)
> 14/11/10 18:39:20 ERROR TaskSchedulerImpl: Lost executor 2 on aa.local:
> remote Akka client disassociated
> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2,
> aa.local): ExecutorLostFailure (executor lost)
> 14/11/10 18:39:20 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3,
> aa.local): ExecutorLostFailure (executor lost)
> 14/11/10 18:39:26 ERROR TaskSchedulerImpl: Lost executor 4 on aa.local:
> remote Akka client disassociated
> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5,
> aa.local): ExecutorLostFailure (executor lost)
> 14/11/10 18:39:26 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4,
> aa.local): ExecutorLostFailure (executor lost)
> 14/11/10 18:39:29 ERROR TaskSchedulerImpl: Lost executor 5 on aa.local:
> remote Akka client disassociated
> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7,
> aa.local): ExecutorLostFailure (executor lost)
> 14/11/10 18:39:29 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4
> times; aborting job
> 14/11/10 18:39:29 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6,
> aa.local): ExecutorLostFailure (executor lost)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
> failure: Lost task 0.3 in stage 0.0 (TID 7, gonephishing.local):
> ExecutorLostFailure (executor lost)
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> =================================================================
> Any fixes?
>