You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nathan Kronenfeld (JIRA)" <ji...@apache.org> on 2014/10/16 07:54:33 UTC

[jira] [Commented] (SPARK-3885) Provide mechanism to remove accumulators once they are no longer used

    [ https://issues.apache.org/jira/browse/SPARK-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173407#comment-14173407 ] 

Nathan Kronenfeld commented on SPARK-3885:
------------------------------------------

I tried reusing accumulators and clearing them, but I'm still running out of memory.

According to the profiler, there is a lot still held by Accumulators$.localAccums.

If I read things right, this is where the workers hold on to their versions of the accumulator?

It looks like there is a clear call, but it looks to me like it's only run at the beginning of a task (which means it should be keeping these around at the end of a job, until another task is run on that thread - which makes it a bit tough to profile.

Am I reading all this correctly? If so, I can see why it isn't cleared out at the end of a job - there's probably no way of doing that safely, since the workers don't know when the end is.

Ideally, I'd love a call whereby I can explicitly release an accumulator.  It seems to me that would require a parallel map in Accumulators$ that kept track of the threads on which each accumulator was stored, so it could clear them.

Am I understanding this all correctly?  If so, I think I could put together the fix I describe pretty easily.

> Provide mechanism to remove accumulators once they are no longer used
> ---------------------------------------------------------------------
>
>                 Key: SPARK-3885
>                 URL: https://issues.apache.org/jira/browse/SPARK-3885
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2, 1.1.0, 1.2.0
>            Reporter: Josh Rosen
>
> Spark does not currently provide any mechanism to delete accumulators after they are no longer used.  This can lead to OOMs for long-lived SparkContexts that create many large accumulators.
> Part of the problem is that accumulators are registered in a global {{Accumulators}} registry.  Maybe the fix would be as simple as using weak references in the Accumulators registry so that accumulators can be GC'd once they can no longer be used.
> In the meantime, here's a workaround that users can try:
> Accumulators have a public setValue() method that can be called (only by the driver) to change an accumulator’s value.  You might be able to use this to reset accumulators’ values to smaller objects (e.g. the “zero” object of whatever your accumulator type is, or ‘null’ if you’re sure that the accumulator will never be accessed again).
> This issue was originally reported by [~nkronenfeld] on the dev mailing list: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Accumulator-question-td8709.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org