You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mingyu Kim (JIRA)" <ji...@apache.org> on 2014/12/19 19:45:14 UTC

[jira] [Created] (SPARK-4906) Spark master OOMs with exception stack trace stored in JobProgressListener

Mingyu Kim created SPARK-4906:
---------------------------------

             Summary: Spark master OOMs with exception stack trace stored in JobProgressListener
                 Key: SPARK-4906
                 URL: https://issues.apache.org/jira/browse/SPARK-4906
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.1.1
            Reporter: Mingyu Kim


Spark master was OOMing with a lot of stack traces retained in JobProgressListener. The object dependency goes like the following.

JobProgressListener.stageIdToData => StageUIData.taskData => TaskUIData.errorMessage

Each error message is ~10kb since it has the entire stack trace. As we have a lot of tasks, when all of the tasks across multiple stages go bad, these error messages accounted for 0.5GB of heap at some point.

Please correct me if I'm wrong, but it looks like all the task info for running applications are kept in memory, which means it's almost always bound to OOM for long-running applications. Would it make sense to fix this, for example, by spilling some UI states to disk?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org