You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Roberto Coluccio <ro...@gmail.com> on 2015/08/24 18:09:11 UTC

Unable to catch SparkContext methods exceptions

Hello folks,

I'm experiencing an unexpected behaviour, that suggests me thinking about
my missing notions on how Spark works. Let's say I have a Spark driver that
invokes a function like:

----- in myDriver -----

val sparkContext = new SparkContext(mySparkConf)
val inputPath = "file://home/myUser/project/resources/date=*/*"

val myResult = new MyResultFunction()(sparkContext, inputPath)

----- in MyResultFunctionOverRDD ------

class MyResultFunction extends Function2[SparkContext, String, RDD[String]]
with Serializable {
  override def apply(sparkContext: SparkContext, inputPath: String):
RDD[String] = {
    try {
      sparkContext.textFile(inputPath, 1)
    } catch {
      case t: Throwable => {
        myLogger.error(s"error: ${t.getStackTraceString}\n")
        sc.makeRDD(Seq[String]())
      }
    }
  }
}

What happens is that I'm *unable to catch exceptions* thrown by the
"textFile" method within the try..catch clause in MyResultFunction. In
fact, in a unit test for that function where I call it passing an invalid
"inputPath", I don't get an empty RDD as result, but the unit test exits
(and fails) due to exception not handled.

What am I missing here?

Thank you.

Best regards,
Roberto

Re: Unable to catch SparkContext methods exceptions

Posted by Burak Yavuz <br...@gmail.com>.

The laziness is hard to deal with in these situations. I would suggest
trying to handle expected cases "FileNotFound", etc using other methods
before even starting a Spark job. If you really want to try.catch a
specific portion of a Spark job, one way is to just follow it with an
action. You can even call persist() before the action, so that you can
re-use the rdd.

Best,
Burak

On Mon, Aug 24, 2015 at 10:52 AM, Roberto Coluccio <
roberto.coluccio@gmail.com> wrote:

> Hi Burak, thanks for your answer.
>
> I have a "new MyResultFunction()(sparkContext, inputPath).collect" in the
> unit test (so to evaluate the actual result), and there I can observe and
> catch the exception. Even considering Spark's laziness, shouldn't I catch
> the exception while occurring in the try..catch statement that encloses the
> textFile invocation?
>
> Best,
> Roberto
>
>
> On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz <br...@gmail.com> wrote:
>
>> textFile is a lazy operation. It doesn't evaluate until you call an
>> action on it, such as .count(). Therefore, you won't catch the exception
>> there.
>>
>> Best,
>> Burak
>>
>> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
>> roberto.coluccio@gmail.com> wrote:
>>
>>> Hello folks,
>>>
>>> I'm experiencing an unexpected behaviour, that suggests me thinking
>>> about my missing notions on how Spark works. Let's say I have a Spark
>>> driver that invokes a function like:
>>>
>>> ----- in myDriver -----
>>>
>>> val sparkContext = new SparkContext(mySparkConf)
>>> val inputPath = "file://home/myUser/project/resources/date=*/*"
>>>
>>> val myResult = new MyResultFunction()(sparkContext, inputPath)
>>>
>>> ----- in MyResultFunctionOverRDD ------
>>>
>>> class MyResultFunction extends Function2[SparkContext, String,
>>> RDD[String]] with Serializable {
>>>   override def apply(sparkContext: SparkContext, inputPath: String):
>>> RDD[String] = {
>>>     try {
>>>       sparkContext.textFile(inputPath, 1)
>>>     } catch {
>>>       case t: Throwable => {
>>>         myLogger.error(s"error: ${t.getStackTraceString}\n")
>>>         sc.makeRDD(Seq[String]())
>>>       }
>>>     }
>>>   }
>>> }
>>>
>>> What happens is that I'm *unable to catch exceptions* thrown by the
>>> "textFile" method within the try..catch clause in MyResultFunction. In
>>> fact, in a unit test for that function where I call it passing an invalid
>>> "inputPath", I don't get an empty RDD as result, but the unit test exits
>>> (and fails) due to exception not handled.
>>>
>>> What am I missing here?
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Roberto
>>>
>>
>>
>

Re: Unable to catch SparkContext methods exceptions

Posted by Roberto Coluccio <ro...@gmail.com>.

Hi Burak, thanks for your answer.

I have a "new MyResultFunction()(sparkContext, inputPath).collect" in the
unit test (so to evaluate the actual result), and there I can observe and
catch the exception. Even considering Spark's laziness, shouldn't I catch
the exception while occurring in the try..catch statement that encloses the
textFile invocation?

Best,
Roberto


On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz <br...@gmail.com> wrote:

> textFile is a lazy operation. It doesn't evaluate until you call an action
> on it, such as .count(). Therefore, you won't catch the exception there.
>
> Best,
> Burak
>
> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
> roberto.coluccio@gmail.com> wrote:
>
>> Hello folks,
>>
>> I'm experiencing an unexpected behaviour, that suggests me thinking about
>> my missing notions on how Spark works. Let's say I have a Spark driver that
>> invokes a function like:
>>
>> ----- in myDriver -----
>>
>> val sparkContext = new SparkContext(mySparkConf)
>> val inputPath = "file://home/myUser/project/resources/date=*/*"
>>
>> val myResult = new MyResultFunction()(sparkContext, inputPath)
>>
>> ----- in MyResultFunctionOverRDD ------
>>
>> class MyResultFunction extends Function2[SparkContext, String,
>> RDD[String]] with Serializable {
>>   override def apply(sparkContext: SparkContext, inputPath: String):
>> RDD[String] = {
>>     try {
>>       sparkContext.textFile(inputPath, 1)
>>     } catch {
>>       case t: Throwable => {
>>         myLogger.error(s"error: ${t.getStackTraceString}\n")
>>         sc.makeRDD(Seq[String]())
>>       }
>>     }
>>   }
>> }
>>
>> What happens is that I'm *unable to catch exceptions* thrown by the
>> "textFile" method within the try..catch clause in MyResultFunction. In
>> fact, in a unit test for that function where I call it passing an invalid
>> "inputPath", I don't get an empty RDD as result, but the unit test exits
>> (and fails) due to exception not handled.
>>
>> What am I missing here?
>>
>> Thank you.
>>
>> Best regards,
>> Roberto
>>
>
>

Re: Unable to catch SparkContext methods exceptions

Posted by Burak Yavuz <br...@gmail.com>.

textFile is a lazy operation. It doesn't evaluate until you call an action
on it, such as .count(). Therefore, you won't catch the exception there.

Best,
Burak

On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
roberto.coluccio@gmail.com> wrote:

> Hello folks,
>
> I'm experiencing an unexpected behaviour, that suggests me thinking about
> my missing notions on how Spark works. Let's say I have a Spark driver that
> invokes a function like:
>
> ----- in myDriver -----
>
> val sparkContext = new SparkContext(mySparkConf)
> val inputPath = "file://home/myUser/project/resources/date=*/*"
>
> val myResult = new MyResultFunction()(sparkContext, inputPath)
>
> ----- in MyResultFunctionOverRDD ------
>
> class MyResultFunction extends Function2[SparkContext, String,
> RDD[String]] with Serializable {
>   override def apply(sparkContext: SparkContext, inputPath: String):
> RDD[String] = {
>     try {
>       sparkContext.textFile(inputPath, 1)
>     } catch {
>       case t: Throwable => {
>         myLogger.error(s"error: ${t.getStackTraceString}\n")
>         sc.makeRDD(Seq[String]())
>       }
>     }
>   }
> }
>
> What happens is that I'm *unable to catch exceptions* thrown by the
> "textFile" method within the try..catch clause in MyResultFunction. In
> fact, in a unit test for that function where I call it passing an invalid
> "inputPath", I don't get an empty RDD as result, but the unit test exits
> (and fails) due to exception not handled.
>
> What am I missing here?
>
> Thank you.
>
> Best regards,
> Roberto
>