You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Neil Ferguson (JIRA)" <ji...@apache.org> on 2014/08/15 01:19:18 UTC

[jira] [Created] (SPARK-3051) Support looking-up named accumulators in a registry

Neil Ferguson created SPARK-3051:
------------------------------------

             Summary: Support looking-up named accumulators in a registry
                 Key: SPARK-3051
                 URL: https://issues.apache.org/jira/browse/SPARK-3051
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Neil Ferguson


This is a proposed enhancement to Spark based on the following mailing list discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/quot-Dynamic-variables-quot-in-Spark-td7450.html.

This proposal builds on SPARK-2380 (Support displaying accumulator values in the web UI) to allow named accumulables to be looked-up in a "registry", as opposed to having to be passed to every method that need to access them.

The use case was described well by [~shivaram], as follows:

Lets say you have two functions you use 
in a map call and want to measure how much time each of them takes. For 
example, if you have a code block like the one below and you want to 
measure how much time f1 takes as a fraction of the task. 

{noformat}
a.map { l => 
   val f = f1(l) 
   ... some work here ... 
} 
{noformat}

It would be really cool if we could do something like 

{noformat}
a.map { l => 
   val start = System.nanoTime 
   val f = f1(l) 
   TaskMetrics.get("f1-time").add(System.nanoTime - start) 
} 
{noformat}

SPARK-2380 provides a partial solution to this problem -- however the accumulables would still need to be passed to every function that needs them, which I think would be cumbersome in any application of reasonable complexity.

The proposal, as suggested by [~pwendell], is to have a "registry" of accumulables, that can be looked-up by name. 

Regarding the implementation details, I'd propose that we broadcast a serialized version of all named accumulables in the DAGScheduler (similar to what SPARK-2521 does for Tasks). These can then be deserialized in the Executor. 

Accumulables are already stored in thread-local variables in the Accumulators object, so exposing these in the registry should be simply a matter of wrapping this object, and keying the accumulables by name (they are currently keyed by ID).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org