You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/05/10 22:19:00 UTC

[jira] [Resolved] (SPARK-6464) Add a new transformation of rdd named processCoalesce which was particularly to deal with the small and cached rdd

     [ https://issues.apache.org/jira/browse/SPARK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-6464.
------------------------------
    Resolution: Won't Fix

> Add a new transformation of rdd named processCoalesce which was  particularly to deal with the small and cached rdd
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6464
>                 URL: https://issues.apache.org/jira/browse/SPARK-6464
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.4.0
>            Reporter: SaintBacchus
>         Attachments: screenshot-1.png
>
>
> Nowadays, the transformation *coalesce* was always used to expand or reduce the number of the partition in order to gain a good performance.
> But *coalesce* can't make sure that the child partition will be executed in the same executor as the parent partition. And this will lead to have a large network transfer.
> In some scenario such as I mentioned in the title +small and cached rdd+, we want to coalesce all the partition in the same executor into one partition and make sure the child partition will be executed in this executor. It can avoid network transfer and reduce the scheduler of the Tasks and also can reused the cpu core to do other job. 
> In this scenario, our performance had improved 20% than before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org