You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2015/07/15 04:55:04 UTC

[jira] [Resolved] (SPARK-5523) TaskMetrics and TaskInfo have innumerable copies of the hostname string

     [ https://issues.apache.org/jira/browse/SPARK-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tathagata Das resolved SPARK-5523.
----------------------------------
       Resolution: Fixed
         Assignee: Saisai Shao
    Fix Version/s: 1.5.0

> TaskMetrics and TaskInfo have innumerable copies of the hostname string
> -----------------------------------------------------------------------
>
>                 Key: SPARK-5523
>                 URL: https://issues.apache.org/jira/browse/SPARK-5523
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Streaming
>            Reporter: Tathagata Das
>            Assignee: Saisai Shao
>             Fix For: 1.5.0
>
>
>  TaskMetrics and TaskInfo objects have the hostname associated with the task. As these are created (directly or through deserialization of RPC messages), each of them have a separate String object for the hostname even though most of them have the same string data in them. This results in thousands of string objects, increasing memory requirement of the driver. 
> This can be easily deduped when deserializing a TaskMetrics object, or when creating a TaskInfo object.
> This affects streaming particularly bad due to the rate of job/stage/task generation. 
> For solution, see how this dedup is done for StorageLevel. 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/StorageLevel.scala#L226 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org