You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by John Salvatier <js...@gmail.com> on 2014/04/04 19:40:31 UTC

How are exceptions in map functions handled in Spark?

I'm trying to get a clear idea about how exceptions are handled in Spark?
Is there somewhere where I can read about this? I'm on spark .7

For some reason I was under the impression that such exceptions are
swallowed and the value that produced them ignored but the exception is
logged. However, right now we're seeing the task just re-tried over and
over again in an infinite loop because there's a value that always
generates an exception.

John

Re: How are exceptions in map functions handled in Spark?

Posted by Andrew Or <an...@databricks.com>.
Logging inside a map function shouldn't "freeze things." The messages
should be logged on the worker logs, since the code is executed on the
executors. If you throw a SparkException, however, it'll be propagated to
the driver after it has failed 4 or more times (by default).

On Fri, Apr 4, 2014 at 11:57 AM, John Salvatier <js...@gmail.com>wrote:

> Btw, thank you for your help.
>
>
> On Fri, Apr 4, 2014 at 11:49 AM, John Salvatier <js...@gmail.com>wrote:
>
>> Is there a way to log exceptions inside a mapping function? logError and
>> logInfo seem to freeze things.
>>
>>
>> On Fri, Apr 4, 2014 at 11:02 AM, Matei Zaharia <ma...@gmail.com>wrote:
>>
>>> Exceptions should be sent back to the driver program and logged there
>>> (with a SparkException thrown if a task fails more than 4 times), but there
>>> were some bugs before where this did not happen for non-Serializable
>>> exceptions. We changed it to pass back the stack traces only (as text),
>>> which should always work. I'd recommend trying a newer Spark version, 0.8
>>> should be easy to upgrade to from 0.7.
>>>
>>> Matei
>>>
>>> On Apr 4, 2014, at 10:40 AM, John Salvatier <js...@gmail.com>
>>> wrote:
>>>
>>> > I'm trying to get a clear idea about how exceptions are handled in
>>> Spark? Is there somewhere where I can read about this? I'm on spark .7
>>> >
>>> > For some reason I was under the impression that such exceptions are
>>> swallowed and the value that produced them ignored but the exception is
>>> logged. However, right now we're seeing the task just re-tried over and
>>> over again in an infinite loop because there's a value that always
>>> generates an exception.
>>> >
>>> > John
>>>
>>>
>>
>

Re: How are exceptions in map functions handled in Spark?

Posted by John Salvatier <js...@gmail.com>.
Btw, thank you for your help.


On Fri, Apr 4, 2014 at 11:49 AM, John Salvatier <js...@gmail.com>wrote:

> Is there a way to log exceptions inside a mapping function? logError and
> logInfo seem to freeze things.
>
>
> On Fri, Apr 4, 2014 at 11:02 AM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> Exceptions should be sent back to the driver program and logged there
>> (with a SparkException thrown if a task fails more than 4 times), but there
>> were some bugs before where this did not happen for non-Serializable
>> exceptions. We changed it to pass back the stack traces only (as text),
>> which should always work. I'd recommend trying a newer Spark version, 0.8
>> should be easy to upgrade to from 0.7.
>>
>> Matei
>>
>> On Apr 4, 2014, at 10:40 AM, John Salvatier <js...@gmail.com> wrote:
>>
>> > I'm trying to get a clear idea about how exceptions are handled in
>> Spark? Is there somewhere where I can read about this? I'm on spark .7
>> >
>> > For some reason I was under the impression that such exceptions are
>> swallowed and the value that produced them ignored but the exception is
>> logged. However, right now we're seeing the task just re-tried over and
>> over again in an infinite loop because there's a value that always
>> generates an exception.
>> >
>> > John
>>
>>
>

Re: How are exceptions in map functions handled in Spark?

Posted by Matei Zaharia <ma...@gmail.com>.
Make sure you initialize a log4j Log object on the workers and not on the driver program. If you’re somehow referencing a logInfo method on the driver program, the Log object might not get sent across the network correctly (though you’d usually get some other error there, like NotSerializableException). Maybe try using println() at first.

As Andrew said, the log output will go into the stdout and stderr files in the work directory on your worker. You can also access those from the Spark cluster’s web UI (click on a worker there, then click on stdout / stderr).

Matei


On Apr 4, 2014, at 11:49 AM, John Salvatier <js...@gmail.com> wrote:

> Is there a way to log exceptions inside a mapping function? logError and logInfo seem to freeze things. 
> 
> 
> On Fri, Apr 4, 2014 at 11:02 AM, Matei Zaharia <ma...@gmail.com> wrote:
> Exceptions should be sent back to the driver program and logged there (with a SparkException thrown if a task fails more than 4 times), but there were some bugs before where this did not happen for non-Serializable exceptions. We changed it to pass back the stack traces only (as text), which should always work. I’d recommend trying a newer Spark version, 0.8 should be easy to upgrade to from 0.7.
> 
> Matei
> 
> On Apr 4, 2014, at 10:40 AM, John Salvatier <js...@gmail.com> wrote:
> 
> > I'm trying to get a clear idea about how exceptions are handled in Spark? Is there somewhere where I can read about this? I'm on spark .7
> >
> > For some reason I was under the impression that such exceptions are swallowed and the value that produced them ignored but the exception is logged. However, right now we're seeing the task just re-tried over and over again in an infinite loop because there's a value that always generates an exception.
> >
> > John
> 
> 


Re: How are exceptions in map functions handled in Spark?

Posted by John Salvatier <js...@gmail.com>.
Is there a way to log exceptions inside a mapping function? logError and
logInfo seem to freeze things.


On Fri, Apr 4, 2014 at 11:02 AM, Matei Zaharia <ma...@gmail.com>wrote:

> Exceptions should be sent back to the driver program and logged there
> (with a SparkException thrown if a task fails more than 4 times), but there
> were some bugs before where this did not happen for non-Serializable
> exceptions. We changed it to pass back the stack traces only (as text),
> which should always work. I'd recommend trying a newer Spark version, 0.8
> should be easy to upgrade to from 0.7.
>
> Matei
>
> On Apr 4, 2014, at 10:40 AM, John Salvatier <js...@gmail.com> wrote:
>
> > I'm trying to get a clear idea about how exceptions are handled in
> Spark? Is there somewhere where I can read about this? I'm on spark .7
> >
> > For some reason I was under the impression that such exceptions are
> swallowed and the value that produced them ignored but the exception is
> logged. However, right now we're seeing the task just re-tried over and
> over again in an infinite loop because there's a value that always
> generates an exception.
> >
> > John
>
>

Re: How are exceptions in map functions handled in Spark?

Posted by Matei Zaharia <ma...@gmail.com>.
Exceptions should be sent back to the driver program and logged there (with a SparkException thrown if a task fails more than 4 times), but there were some bugs before where this did not happen for non-Serializable exceptions. We changed it to pass back the stack traces only (as text), which should always work. I’d recommend trying a newer Spark version, 0.8 should be easy to upgrade to from 0.7.

Matei

On Apr 4, 2014, at 10:40 AM, John Salvatier <js...@gmail.com> wrote:

> I'm trying to get a clear idea about how exceptions are handled in Spark? Is there somewhere where I can read about this? I'm on spark .7
> 
> For some reason I was under the impression that such exceptions are swallowed and the value that produced them ignored but the exception is logged. However, right now we're seeing the task just re-tried over and over again in an infinite loop because there's a value that always generates an exception.
> 
> John