You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/04/01 12:57:52 UTC

[jira] [Resolved] (SPARK-4927) Spark does not clean up properly during long jobs.

     [ https://issues.apache.org/jira/browse/SPARK-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-4927.
------------------------------
    Resolution: Cannot Reproduce

At the moment I've tried to reproduce this a few ways and wasn't able to. It may have been fixed somehow since. It can be reopened if there is a reproduction vs 1.3+

> Spark does not clean up properly during long jobs. 
> ---------------------------------------------------
>
>                 Key: SPARK-4927
>                 URL: https://issues.apache.org/jira/browse/SPARK-4927
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Ilya Ganelin
>
> On a long running Spark job, Spark will eventually run out of memory on the driver node due to metadata overhead from the shuffle operation. Spark will continue to operate, however with drastically decreased performance (since swapping now occurs with every operation).
> The spark.cleanup.tll parameter allows a user to configure when cleanup happens but the issue with doing this is that it isn’t done safely, e.g. If this clears a cached RDD or active task in the middle of processing a stage, this ultimately causes a KeyNotFoundException when the next stage attempts to reference the cleared RDD or task.
> There should be a sustainable mechanism for cleaning up stale metadata that allows the program to continue running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org