You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/01/21 02:12:39 UTC

[jira] [Resolved] (SPARK-8968) dynamic partitioning in spark sql performance issue due to the high GC overhead

     [ https://issues.apache.org/jira/browse/SPARK-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin resolved SPARK-8968.
--------------------------------
       Resolution: Fixed
         Assignee: Fei Wang
    Fix Version/s: 2.0.0

> dynamic partitioning in spark sql performance issue due to the high GC overhead
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-8968
>                 URL: https://issues.apache.org/jira/browse/SPARK-8968
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Fei Wang
>            Assignee: Fei Wang
>             Fix For: 2.0.0
>
>
> now the dynamic partitioning show the bad performance for big data due to the GC/memory overhead.  this is because each task each partition now we open a writer to write the data, this will cause many small files and high GC. We can shuffle data by the partition columns so that each partition will have ony one partition file and this also reduce the gc overhead  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org