You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2016/03/05 20:24:40 UTC

[jira] [Commented] (SPARK-13365) should coalesce do anything if coalescing to same number of partitions without shuffle

    [ https://issues.apache.org/jira/browse/SPARK-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181827#comment-15181827 ] 

Josh Rosen commented on SPARK-13365:
------------------------------------

If coalesce is called with {{shuffle == true}} then we might actually want to run the coalesce because the user's intent might be to produce more evenly-balanced partitions. If {{shuffle == false}}, though, then it seems fine to skip the coalesce since it would be a no-op. I believe that Spark SQL performs a similar optimization.

> should coalesce do anything if coalescing to same number of partitions without shuffle
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-13365
>                 URL: https://issues.apache.org/jira/browse/SPARK-13365
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.6.0
>            Reporter: Thomas Graves
>
> Currently if a user does a coalesce to the same number of partitions as already exist it spends a bunch of time doing stuff when it seems like it shouldn't do anything.
> for instance I have an RDD with 100 partitions if I run coalesce(100) it seems like it should skip any computation since it already has 100 partitions.  One case I've seen this is actually when users do coalesce(1000) without the shuffle which really turns into a coalesce(100).
> I'm presenting this as a question as I'm not sure if there are use cases I haven't thought of where this would break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org