You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Nithin Asokan <an...@gmail.com> on 2015/09/28 23:46:33 UTC

Question about Spark Job/Stage names

I'm fairly new to Spark, and would like to understand about stage/job names
when using Crunch on Spark. When I submit my Spark application, I see a set
of stage names like *mapToPair at PGroupedTableImpl.java:108. *I would like
to understand if it possible by user code to update these stage names
dynamically? Perhaps, is it possible to have DoFn names as Stage names?

I did a little bit of digging and the closest thing I can find to modify
stage name is using

sparkContext.setCallSite(String)

However, this updates all stage and job names to same text. I tried looking
at MRPipeline's implementation to understand how JobNames are built, and I
believe for SparkPipeline crunch does not create DAG and we don't create a
job name.

But does anyone with Spark expertise know if it's possible in Crunch to
create job/stage names based on DoFn names?

Thank you!
Nithin

Re: Question about Spark Job/Stage names

Posted by Nithin Asokan <an...@gmail.com>.
Yeah, I'm starting to think it's not possible to have dynamic stage names
at this time. But thanks for taking a look at this Josh.

On Tue, Sep 29, 2015 at 9:12 AM Josh Wills <jw...@cloudera.com> wrote:

> Hey Nithin,
>
> I checked around about this-- apparently the stage name is hard-coded to
> be the call-site of the code block that triggered the stage:
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Stage.scala
>
> Right now, we pass the names for DoFns to the RDDs we create via
> RDD.setName, but obviously that doesn't play into the stage name control.
>
> J
>
> On Mon, Sep 28, 2015 at 5:46 PM, Nithin Asokan <an...@gmail.com>
> wrote:
>
>> I'm fairly new to Spark, and would like to understand about stage/job
>> names when using Crunch on Spark. When I submit my Spark application, I see
>> a set of stage names like *mapToPair at PGroupedTableImpl.java:108. *I
>> would like to understand if it possible by user code to update these stage
>> names dynamically? Perhaps, is it possible to have DoFn names as Stage
>> names?
>>
>> I did a little bit of digging and the closest thing I can find to modify
>> stage name is using
>>
>> sparkContext.setCallSite(String)
>>
>> However, this updates all stage and job names to same text. I tried
>> looking at MRPipeline's implementation to understand how JobNames are
>> built, and I believe for SparkPipeline crunch does not create DAG and we
>> don't create a job name.
>>
>> But does anyone with Spark expertise know if it's possible in Crunch to
>> create job/stage names based on DoFn names?
>>
>> Thank you!
>> Nithin
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Re: Question about Spark Job/Stage names

Posted by Josh Wills <jw...@cloudera.com>.
Hey Nithin,

I checked around about this-- apparently the stage name is hard-coded to be
the call-site of the code block that triggered the stage:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Stage.scala

Right now, we pass the names for DoFns to the RDDs we create via
RDD.setName, but obviously that doesn't play into the stage name control.

J

On Mon, Sep 28, 2015 at 5:46 PM, Nithin Asokan <an...@gmail.com> wrote:

> I'm fairly new to Spark, and would like to understand about stage/job
> names when using Crunch on Spark. When I submit my Spark application, I see
> a set of stage names like *mapToPair at PGroupedTableImpl.java:108. *I
> would like to understand if it possible by user code to update these stage
> names dynamically? Perhaps, is it possible to have DoFn names as Stage
> names?
>
> I did a little bit of digging and the closest thing I can find to modify
> stage name is using
>
> sparkContext.setCallSite(String)
>
> However, this updates all stage and job names to same text. I tried
> looking at MRPipeline's implementation to understand how JobNames are
> built, and I believe for SparkPipeline crunch does not create DAG and we
> don't create a job name.
>
> But does anyone with Spark expertise know if it's possible in Crunch to
> create job/stage names based on DoFn names?
>
> Thank you!
> Nithin
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>