You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2013/11/29 21:09:55 UTC

0.9-SNAPSHOT StageInfo

in 0.9-SNAPSHOT StageInfo has been changed to make the stage itself no
longer accessible.

however the stage contains the rdd, which is necessary to tie this
StageInfo to an RDD. now all we have is the rddName. is the rddName
guaranteed to be unique, and can it be relied upon to identify RDDs?

Re: 0.9-SNAPSHOT StageInfo

Posted by Koert Kuipers <ko...@tresata.com>.
i use a SparkListener to collect info about failures in task related to my
RDD.

to do so for every stage submitted i verify if the stage is for an RDD that
is a dependency of my target target RDD (including the target RDD itself).

then for every task ending i check if the task is for a stage i care about,
after which i collect any errors for the task (for which i already have to
break the spark API, since i currently cannot pattern match on
taskEnd.reason due to the private nature of ExceptionFailure and friends.

all of this simply to be able to provide the user with a useful error
message as to why the calculation failed (as opposed to: fetch failed more
than 4 times).


On Fri, Nov 29, 2013 at 3:09 PM, Koert Kuipers <ko...@tresata.com> wrote:

> in 0.9-SNAPSHOT StageInfo has been changed to make the stage itself no
> longer accessible.
>
> however the stage contains the rdd, which is necessary to tie this
> StageInfo to an RDD. now all we have is the rddName. is the rddName
> guaranteed to be unique, and can it be relied upon to identify RDDs?
>