You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by VIJAYAKUMAR JAWAHARLAL <sp...@data2o.io> on 2015/08/19 00:26:19 UTC

What is the reason for ExecutorLostFailure?

Hi All

Why am I getting ExecutorLostFailure and executors are completely lost for rest of the processing? Eventually it makes job to fail. One thing for sure that lot of shuffling happens across executors in my program. 

Is there a way to understand and debug ExecutorLostFailure? Any pointers regarding “ExecutorLostFailure” would help me a lot.

Thanks
Vijay

Re: What is the reason for ExecutorLostFailure?

Posted by VIJAYAKUMAR JAWAHARLAL <sp...@data2o.io>.
Hints are good. Thanks Corey. I will try to find out more from the logs.

> On Aug 18, 2015, at 7:23 PM, Corey Nolet <cj...@gmail.com> wrote:
> 
> Usually more information as to the cause of this will be found down in your logs. I generally see this happen when an out of memory exception has occurred for one reason or another on an executor. It's possible your memory settings are too small per executor or the concurrent number of tasks you are running are too large for some of the executors. Other times, it's possible using RDD functions like groupBy() that collect an unbounded amount of items into memory could be causing it. 
> 
> Either way, the logs for the executors should be able to give you some insight, have you looked at those yet?
> 
> On Tue, Aug 18, 2015 at 6:26 PM, VIJAYAKUMAR JAWAHARLAL <sparkhelp@data2o.io <ma...@data2o.io>> wrote:
> Hi All
> 
> Why am I getting ExecutorLostFailure and executors are completely lost for rest of the processing? Eventually it makes job to fail. One thing for sure that lot of shuffling happens across executors in my program. 
> 
> Is there a way to understand and debug ExecutorLostFailure? Any pointers regarding “ExecutorLostFailure” would help me a lot.
> 
> Thanks
> Vijay
> 


Re: What is the reason for ExecutorLostFailure?

Posted by Corey Nolet <cj...@gmail.com>.
Usually more information as to the cause of this will be found down in your
logs. I generally see this happen when an out of memory exception has
occurred for one reason or another on an executor. It's possible your
memory settings are too small per executor or the concurrent number of
tasks you are running are too large for some of the executors. Other times,
it's possible using RDD functions like groupBy() that collect an unbounded
amount of items into memory could be causing it.

Either way, the logs for the executors should be able to give you some
insight, have you looked at those yet?

On Tue, Aug 18, 2015 at 6:26 PM, VIJAYAKUMAR JAWAHARLAL <sparkhelp@data2o.io
> wrote:

> Hi All
>
> Why am I getting ExecutorLostFailure and executors are completely lost
> for rest of the processing? Eventually it makes job to fail. One thing for
> sure that lot of shuffling happens across executors in my program.
>
> Is there a way to understand and debug ExecutorLostFailure? Any pointers
> regarding “ExecutorLostFailure” would help me a lot.
>
> Thanks
> Vijay
>