You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Roberto Coluccio <ro...@gmail.com> on 2015/08/24 18:09:11 UTC
Unable to catch SparkContext methods exceptions
Hello folks,
I'm experiencing an unexpected behaviour, that suggests me thinking about
my missing notions on how Spark works. Let's say I have a Spark driver that
invokes a function like:
----- in myDriver -----
val sparkContext = new SparkContext(mySparkConf)
val inputPath = "file://home/myUser/project/resources/date=*/*"
val myResult = new MyResultFunction()(sparkContext, inputPath)
----- in MyResultFunctionOverRDD ------
class MyResultFunction extends Function2[SparkContext, String, RDD[String]]
with Serializable {
override def apply(sparkContext: SparkContext, inputPath: String):
RDD[String] = {
try {
sparkContext.textFile(inputPath, 1)
} catch {
case t: Throwable => {
myLogger.error(s"error: ${t.getStackTraceString}\n")
sc.makeRDD(Seq[String]())
}
}
}
}
What happens is that I'm *unable to catch exceptions* thrown by the
"textFile" method within the try..catch clause in MyResultFunction. In
fact, in a unit test for that function where I call it passing an invalid
"inputPath", I don't get an empty RDD as result, but the unit test exits
(and fails) due to exception not handled.
What am I missing here?
Thank you.
Best regards,
Roberto
Re: Unable to catch SparkContext methods exceptions
Posted by Burak Yavuz <br...@gmail.com>.
The laziness is hard to deal with in these situations. I would suggest
trying to handle expected cases "FileNotFound", etc using other methods
before even starting a Spark job. If you really want to try.catch a
specific portion of a Spark job, one way is to just follow it with an
action. You can even call persist() before the action, so that you can
re-use the rdd.
Best,
Burak
On Mon, Aug 24, 2015 at 10:52 AM, Roberto Coluccio <
roberto.coluccio@gmail.com> wrote:
> Hi Burak, thanks for your answer.
>
> I have a "new MyResultFunction()(sparkContext, inputPath).collect" in the
> unit test (so to evaluate the actual result), and there I can observe and
> catch the exception. Even considering Spark's laziness, shouldn't I catch
> the exception while occurring in the try..catch statement that encloses the
> textFile invocation?
>
> Best,
> Roberto
>
>
> On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz <br...@gmail.com> wrote:
>
>> textFile is a lazy operation. It doesn't evaluate until you call an
>> action on it, such as .count(). Therefore, you won't catch the exception
>> there.
>>
>> Best,
>> Burak
>>
>> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
>> roberto.coluccio@gmail.com> wrote:
>>
>>> Hello folks,
>>>
>>> I'm experiencing an unexpected behaviour, that suggests me thinking
>>> about my missing notions on how Spark works. Let's say I have a Spark
>>> driver that invokes a function like:
>>>
>>> ----- in myDriver -----
>>>
>>> val sparkContext = new SparkContext(mySparkConf)
>>> val inputPath = "file://home/myUser/project/resources/date=*/*"
>>>
>>> val myResult = new MyResultFunction()(sparkContext, inputPath)
>>>
>>> ----- in MyResultFunctionOverRDD ------
>>>
>>> class MyResultFunction extends Function2[SparkContext, String,
>>> RDD[String]] with Serializable {
>>> override def apply(sparkContext: SparkContext, inputPath: String):
>>> RDD[String] = {
>>> try {
>>> sparkContext.textFile(inputPath, 1)
>>> } catch {
>>> case t: Throwable => {
>>> myLogger.error(s"error: ${t.getStackTraceString}\n")
>>> sc.makeRDD(Seq[String]())
>>> }
>>> }
>>> }
>>> }
>>>
>>> What happens is that I'm *unable to catch exceptions* thrown by the
>>> "textFile" method within the try..catch clause in MyResultFunction. In
>>> fact, in a unit test for that function where I call it passing an invalid
>>> "inputPath", I don't get an empty RDD as result, but the unit test exits
>>> (and fails) due to exception not handled.
>>>
>>> What am I missing here?
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Roberto
>>>
>>
>>
>
Re: Unable to catch SparkContext methods exceptions
Posted by Roberto Coluccio <ro...@gmail.com>.
Hi Burak, thanks for your answer.
I have a "new MyResultFunction()(sparkContext, inputPath).collect" in the
unit test (so to evaluate the actual result), and there I can observe and
catch the exception. Even considering Spark's laziness, shouldn't I catch
the exception while occurring in the try..catch statement that encloses the
textFile invocation?
Best,
Roberto
On Mon, Aug 24, 2015 at 7:38 PM, Burak Yavuz <br...@gmail.com> wrote:
> textFile is a lazy operation. It doesn't evaluate until you call an action
> on it, such as .count(). Therefore, you won't catch the exception there.
>
> Best,
> Burak
>
> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
> roberto.coluccio@gmail.com> wrote:
>
>> Hello folks,
>>
>> I'm experiencing an unexpected behaviour, that suggests me thinking about
>> my missing notions on how Spark works. Let's say I have a Spark driver that
>> invokes a function like:
>>
>> ----- in myDriver -----
>>
>> val sparkContext = new SparkContext(mySparkConf)
>> val inputPath = "file://home/myUser/project/resources/date=*/*"
>>
>> val myResult = new MyResultFunction()(sparkContext, inputPath)
>>
>> ----- in MyResultFunctionOverRDD ------
>>
>> class MyResultFunction extends Function2[SparkContext, String,
>> RDD[String]] with Serializable {
>> override def apply(sparkContext: SparkContext, inputPath: String):
>> RDD[String] = {
>> try {
>> sparkContext.textFile(inputPath, 1)
>> } catch {
>> case t: Throwable => {
>> myLogger.error(s"error: ${t.getStackTraceString}\n")
>> sc.makeRDD(Seq[String]())
>> }
>> }
>> }
>> }
>>
>> What happens is that I'm *unable to catch exceptions* thrown by the
>> "textFile" method within the try..catch clause in MyResultFunction. In
>> fact, in a unit test for that function where I call it passing an invalid
>> "inputPath", I don't get an empty RDD as result, but the unit test exits
>> (and fails) due to exception not handled.
>>
>> What am I missing here?
>>
>> Thank you.
>>
>> Best regards,
>> Roberto
>>
>
>
Re: Unable to catch SparkContext methods exceptions
Posted by Burak Yavuz <br...@gmail.com>.
textFile is a lazy operation. It doesn't evaluate until you call an action
on it, such as .count(). Therefore, you won't catch the exception there.
Best,
Burak
On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
roberto.coluccio@gmail.com> wrote:
> Hello folks,
>
> I'm experiencing an unexpected behaviour, that suggests me thinking about
> my missing notions on how Spark works. Let's say I have a Spark driver that
> invokes a function like:
>
> ----- in myDriver -----
>
> val sparkContext = new SparkContext(mySparkConf)
> val inputPath = "file://home/myUser/project/resources/date=*/*"
>
> val myResult = new MyResultFunction()(sparkContext, inputPath)
>
> ----- in MyResultFunctionOverRDD ------
>
> class MyResultFunction extends Function2[SparkContext, String,
> RDD[String]] with Serializable {
> override def apply(sparkContext: SparkContext, inputPath: String):
> RDD[String] = {
> try {
> sparkContext.textFile(inputPath, 1)
> } catch {
> case t: Throwable => {
> myLogger.error(s"error: ${t.getStackTraceString}\n")
> sc.makeRDD(Seq[String]())
> }
> }
> }
> }
>
> What happens is that I'm *unable to catch exceptions* thrown by the
> "textFile" method within the try..catch clause in MyResultFunction. In
> fact, in a unit test for that function where I call it passing an invalid
> "inputPath", I don't get an empty RDD as result, but the unit test exits
> (and fails) due to exception not handled.
>
> What am I missing here?
>
> Thank you.
>
> Best regards,
> Roberto
>