You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Nithin Asokan <an...@gmail.com> on 2015/09/03 14:25:25 UTC

Spark accumulators

We are currently testing a few capabilities using Spark and one thing we
noticed in Spark is they don't list any user defined accumulators on web
UI.

On MapReduce I would imagine counters being displayed on the job page,
however on a SparkPipeline I was only able to pull counter information from
PipelineResult#getStageResult().

I think the reason these accumulators are not visible on web UI is because
crunch does not name these accumulators. Spark expects an accumulator to
have a name to be visible on the UI.

https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L125-L126

https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala#L616-L624
(accumulator
API with Name)

I would like to know if it's possible in crunch to name these accumulators
so they are available in web UI. This will give us an experience where
users can monitor/watch accumulators from web UI to obtain key information
about their jobs.

Thanks,
Nithin

Re: Spark accumulators

Posted by Nithin Asokan <an...@gmail.com>.
Thank you Micah. With the patch I'm able to see accumulators listed on my
Spark application.

On Thu, Sep 3, 2015 at 7:49 AM Micah Whitacre <mk...@gmail.com> wrote:

> Nithin logged https://issues.apache.org/jira/browse/CRUNCH-558 to track
> this work.  Also added a patch for the change you proposed, want to try it
> out and see if that works for you?
>
> On Thu, Sep 3, 2015 at 7:25 AM, Nithin Asokan <an...@gmail.com> wrote:
>
>> We are currently testing a few capabilities using Spark and one thing we
>> noticed in Spark is they don't list any user defined accumulators on web
>> UI.
>>
>> On MapReduce I would imagine counters being displayed on the job page,
>> however on a SparkPipeline I was only able to pull counter information from
>> PipelineResult#getStageResult().
>>
>> I think the reason these accumulators are not visible on web UI is
>> because crunch does not name these accumulators. Spark expects an
>> accumulator to have a name to be visible on the UI.
>>
>>
>> https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L125-L126
>>
>>
>> https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala#L616-L624 (accumulator
>> API with Name)
>>
>> I would like to know if it's possible in crunch to name these
>> accumulators so they are available in web UI. This will give us an
>> experience where users can monitor/watch accumulators from web UI to obtain
>> key information about their jobs.
>>
>> Thanks,
>> Nithin
>>
>
>

Re: Spark accumulators

Posted by Micah Whitacre <mk...@gmail.com>.
Nithin logged https://issues.apache.org/jira/browse/CRUNCH-558 to track
this work.  Also added a patch for the change you proposed, want to try it
out and see if that works for you?

On Thu, Sep 3, 2015 at 7:25 AM, Nithin Asokan <an...@gmail.com> wrote:

> We are currently testing a few capabilities using Spark and one thing we
> noticed in Spark is they don't list any user defined accumulators on web
> UI.
>
> On MapReduce I would imagine counters being displayed on the job page,
> however on a SparkPipeline I was only able to pull counter information from
> PipelineResult#getStageResult().
>
> I think the reason these accumulators are not visible on web UI is because
> crunch does not name these accumulators. Spark expects an accumulator to
> have a name to be visible on the UI.
>
>
> https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L125-L126
>
>
> https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala#L616-L624 (accumulator
> API with Name)
>
> I would like to know if it's possible in crunch to name these accumulators
> so they are available in web UI. This will give us an experience where
> users can monitor/watch accumulators from web UI to obtain key information
> about their jobs.
>
> Thanks,
> Nithin
>