You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Ash (JIRA)" <ji...@apache.org> on 2014/11/14 10:25:34 UTC

[jira] [Commented] (SPARK-664) Accumulator updates should get locally merged before sent to the driver

    [ https://issues.apache.org/jira/browse/SPARK-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212046#comment-14212046 ] 

Andrew Ash commented on SPARK-664:
----------------------------------

[~irashid] it sounds like your proposal is to batch accumulator updates between tasks on the executor before sending them back to the driver?

I agree this would reduce the amount of network traffic, but the batching would come at a cost of higher latency between task completion and accumulator update landing in the accumulator in the driver.  With the completion of SPARK-2380 these accumulators are now shown in the UI, so increasing latency would have an effect on end users.

If network bandwidth and UI update latency are fundamentally at odds, maybe this is a case for a user option to choose to optimize for network or UI, something like {{spark.accumulators.mergeUpdatesOnExecutor}} defaulted to false.

cc [~pwendell] for thoughts

> Accumulator updates should get locally merged before sent to the driver
> -----------------------------------------------------------------------
>
>                 Key: SPARK-664
>                 URL: https://issues.apache.org/jira/browse/SPARK-664
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Imran Rashid
>            Priority: Minor
>
> Whenever a task finishes, the accumulator updates from that task are immediately sent back to the driver.  When the accumulator updates are big, this is inefficient because (a) a lot more data has to be sent to the driver and (b) the driver has to do all the work of merging the updates together.
> Probably doesn't matter for small accumulators / low number of tasks, but if both are big, this could be a big bottleneck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org