You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ankur Dave (JIRA)" <ji...@apache.org> on 2015/07/14 22:18:04 UTC

[jira] [Resolved] (SPARK-8718) Improve EdgePartition2D for non perfect square number of partitions

     [ https://issues.apache.org/jira/browse/SPARK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ankur Dave resolved SPARK-8718.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Issue resolved by pull request 7104
[https://github.com/apache/spark/pull/7104]

> Improve EdgePartition2D for non perfect square number of partitions
> -------------------------------------------------------------------
>
>                 Key: SPARK-8718
>                 URL: https://issues.apache.org/jira/browse/SPARK-8718
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>            Reporter: Andrew Ray
>            Priority: Minor
>             Fix For: 1.5.0
>
>
> The current implementation of EdgePartition2D has a major limitation:
> bq. One of the limitations of this approach is that the number of machines must either be a perfect square. We partially address this limitation by computing the machine assignment to the next largest perfect square and then mapping back down to the actual number of machines. Unfortunately, this can also lead to work imbalance and so it is suggested that a perfect square is used.
> To remove this limitation I'm proposing the following code change. It allows us to partition into any number of evenly sized bins while maintaining the property that any vertex will only need to be replicated at most 2 * sqrt(numParts) times. To maintain current behavior for perfect squares we use the old algorithm in that case, although this could be removed if we dont care about producing the exact same result.
> See this IPython notebook for a visualization of what is being proposed [https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb] and download it to interactively change the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org