You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2015/10/29 12:47:27 UTC

[jira] [Updated] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

     [ https://issues.apache.org/jira/browse/SPARK-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Or updated SPARK-7970:
-----------------------------
            Assignee: Nitin Goyal
    Target Version/s: 1.6.0

> Optimize code for SQL queries fired on Union of RDDs (closure cleaner)
> ----------------------------------------------------------------------
>
>                 Key: SPARK-7970
>                 URL: https://issues.apache.org/jira/browse/SPARK-7970
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Nitin Goyal
>            Assignee: Nitin Goyal
>         Attachments: Screen Shot 2015-05-27 at 11.01.03 pm.png, Screen Shot 2015-05-27 at 11.07.02 pm.png
>
>
> Closure cleaner slows down the execution of Spark SQL queries fired on union of RDDs. The time increases linearly at driver side with number of RDDs unioned. Refer following thread for more context :-
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tt12466.html
> As can be seen in attached screenshots of Jprofiler, lot of time is getting consumed in "getClassReader" method of ClosureCleaner and rest in "ensureSerializable" (atleast in my case)
> This can be fixed in two ways (as per my current understanding) :-
> 1. Fixed at Spark SQL level - As pointed out by yhuai, we can create MapPartitionsRDD idirectly nstead of doing rdd.mapPartitions which calls ClosureCleaner clean method (See PR - https://github.com/apache/spark/pull/6256).
> 2. Fix at Spark core level -
>   (i) Make "checkSerializable" property driven in SparkContext's clean method
>   (ii) Somehow cache classreader for last 'n' classes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org