You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/06/30 02:21:05 UTC

[jira] [Assigned] (SPARK-8718) Improve EdgePartition2D for non perfect square number of partitions

     [ https://issues.apache.org/jira/browse/SPARK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-8718:
-----------------------------------

    Assignee:     (was: Apache Spark)

> Improve EdgePartition2D for non perfect square number of partitions
> -------------------------------------------------------------------
>
>                 Key: SPARK-8718
>                 URL: https://issues.apache.org/jira/browse/SPARK-8718
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>            Reporter: Andrew Ray
>            Priority: Minor
>
> The current implementation of EdgePartition2D has a major limitation:
> bq. One of the limitations of this approach is that the number of machines must either be a perfect square. We partially address this limitation by computing the machine assignment to the next largest perfect square and then mapping back down to the actual number of machines. Unfortunately, this can also lead to work imbalance and so it is suggested that a perfect square is used.
> To remove this limitation I'm proposing the following code change. It allows us to partition into any number of evenly sized bins while maintaining the property that any vertex will only need to be replicated at most 2 * sqrt(numParts) times. To maintain current behavior for perfect squares we use the old algorithm in that case, although this could be removed if we dont care about producing the exact same result.
> See this IPython notebook for a visualization of what is being proposed [https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb] and download it to interactively change the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org