You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2019/02/12 22:47:00 UTC

[jira] [Commented] (SPARK-26395) Spark Thrift server memory leak

    [ https://issues.apache.org/jira/browse/SPARK-26395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766547#comment-16766547 ] 

Marcelo Vanzin commented on SPARK-26395:
----------------------------------------

The code that cleans up stages does clean up the RDD graphs:

{noformat}
      if (!hasMoreAttempts) {
        kvstore.delete(classOf[RDDOperationGraphWrapper], s.info.stageId)
      }
{noformat}

Are you sure stages are being properly cleaned up in your case? SPARK-25837 could cause stage cleanup to be really slow, that will be fixed in 2.3.3.

> Spark Thrift server memory leak
> -------------------------------
>
>                 Key: SPARK-26395
>                 URL: https://issues.apache.org/jira/browse/SPARK-26395
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.2
>            Reporter: Konstantinos Andrikopoulos
>            Priority: Major
>
> We are running Thrift Server in standalone mode and we have observed that the heap of the driver is constantly increasing. After analysing the heap dump the issue seems to be that the ElementTrackingStore is constantly increasing due to the addition of RDDOperationGraphWrapper objects that are not cleaned up.
> The ElementTrackingStore defines the addTrigger method were you are able to set thresholds in order to perform cleanup but in practice it is used for  ExecutorSummaryWrapper, JobDataWrapper and StageDataWrapper classes by using the following spark properties 
>  * spark.ui.retainedDeadExecutors
>  * spark.ui.retainedJobs
>  * spark.ui.retainedStages
> So the  RDDOperationGraphWrapper which is been added using the onJobStart method of  AppStatusListener class [kvstore.write(uigraph) #line 291]
> in not cleaned up and it constantly increases causing a memory leak



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org