You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/07/10 03:50:04 UTC

[jira] [Assigned] (SPARK-8968) shuffled by the partition clomns when dynamic partitioning to optimize the memory overhead

     [ https://issues.apache.org/jira/browse/SPARK-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-8968:
-----------------------------------

    Assignee: Apache Spark

> shuffled by the partition clomns when dynamic partitioning to optimize the memory overhead
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-8968
>                 URL: https://issues.apache.org/jira/browse/SPARK-8968
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Fei Wang
>            Assignee: Apache Spark
>
> now the dynamic partitioning show the bad performance for big data due to the GC/memory overhead.  this is because each task each partition now we open a writer to write the data, this will cause many small files and high GC. We can shuffle data by the partition columns so that each partition will have ony one partition file and this also reduce the gc overhead  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org