You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mayur Bhosale (Jira)" <ji...@apache.org> on 2023/04/03 11:05:00 UTC

[jira] [Created] (SPARK-43012) Name based access of accumulators from tasks

Mayur Bhosale created SPARK-43012:
-------------------------------------

             Summary: Name based access of accumulators from tasks
                 Key: SPARK-43012
                 URL: https://issues.apache.org/jira/browse/SPARK-43012
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 3.5.0
            Reporter: Mayur Bhosale


At present, accumulators must be registered on the driver and subsequently accessed (i.e., added/reset) by the code running on the executor using the same object. As a result, the accumulator object must be passed throughout the code, which leads to verbosity, particularly in multi-module Spark applications where user code is distributed across multiple modules.

Instead why not have name based access to accumulator so that objects don't need to be passed explicitly? (Sort of global named accumulators)

Currently,

 
{code:java}
val accName = "custom_acc_experimental" 
val la = new LongAccumulator()
sc.register(la, accName)

sc.parallelize(0 to 10).mapPartitions(partition => {
  la.add(100L)
  partition
}).count{code}
Can be,

 

 
{code:java}
val accName = "custom_acc_experimental"
val la = new LongAccumulator()
sc.register(la, accName) 

sc.parallelize(0 to 10).mapPartitions(partition => {
  // Assuming new end-point
  AccumulatorV2.add(accName, 100L)   
  partition 
}).count {code}
I was able to do a [working POC of this|https://github.com/apache/spark/compare/master...mayurdb:spark:experiment] where the user defined accumulators are internally passed to task executable.

Probably a new endpoint needs to be added for users to create such accumulators as handling all current accumulators in this manner would cause pressure on the task flow RPC. I can write a small doc on approach but wanted to get a ack if this seems workable.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org