You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eyal Farago (JIRA)" <ji...@apache.org> on 2019/05/15 15:15:00 UTC
[jira] [Commented] (SPARK-24437) Memory leak in UnsafeHashedRelation

    [ https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840498#comment-16840498 ] 

Eyal Farago commented on SPARK-24437:
-------------------------------------

[~mgaido], looking at this again I suspect in this case loosing the broadcast does not necessarily means loosing the lineage as the broadcast was built by executing a plan and this plan is still available in case the broadcast isn't.

I think the usage of broadcast variables inside spark sql is different than that of directly using them in the RDD API. While in the RDD API the broadcasted data is actually originated at the driver and spark has no way of reconstructing this data, in the spark-sql scenario the broadcasted data is built by spark and the knowledge of reconstructing this data is still available for spark so in the rare event both the broadcast and part of the cached partitions are lost it's possible to reconstruct the broadcasted value (by executing the plan that produced this data in the first place) and then recompute the missing partitions again.

I think it's at least theoretically possible to identify cached plans based on broadcasted data and change the relevant operators to reference the broadcast variable via a soft/weak reference. unfortunately I believe it'd turn out to be quite difficult, especially in the face of no-deterministic operators.

> Memory leak in UnsafeHashedRelation
> -----------------------------------
>
>                 Key: SPARK-24437
>                 URL: https://issues.apache.org/jira/browse/SPARK-24437
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: gagan taneja
>            Priority: Major
>         Attachments: Screen Shot 2018-05-30 at 2.05.40 PM.png, Screen Shot 2018-05-30 at 2.07.22 PM.png, Screen Shot 2018-11-01 at 10.38.30 AM.png
>
>
> There seems to memory leak with org.apache.spark.sql.execution.joins.UnsafeHashedRelation
> We have a long running instance of STS.
> With each query execution requiring Broadcast Join, UnsafeHashedRelation is getting added for cleanup in ContextCleaner. This reference of UnsafeHashedRelation is being held at some other Collection and not becoming eligible for GC and because of this ContextCleaner is not able to clean it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org