You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Ray (JIRA)" <ji...@apache.org> on 2015/06/30 02:17:04 UTC

[jira] [Created] (SPARK-8718) Improve EdgePartition2D for non perfect square number of partitions

Andrew Ray created SPARK-8718:
---------------------------------

             Summary: Improve EdgePartition2D for non perfect square number of partitions
                 Key: SPARK-8718
                 URL: https://issues.apache.org/jira/browse/SPARK-8718
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
            Reporter: Andrew Ray
            Priority: Minor


The current implementation of EdgePartition2D has a major limitation:

bq. One of the limitations of this approach is that the number of machines must either be a perfect square. We partially address this limitation by computing the machine assignment to the next largest perfect square and then mapping back down to the actual number of machines. Unfortunately, this can also lead to work imbalance and so it is suggested that a perfect square is used.

To remove this limitation I'm proposing the following code change. It allows us to partition into any number of evenly sized bins while maintaining the property that any vertex will only need to be replicated at most 2 * sqrt(numParts) times. To maintain current behavior for perfect squares we use the old algorithm in that case, although this could be removed if we dont care about producing the exact same result.

See this IPython notebook for a visualization of what is being proposed [https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb] and download it to interactively change the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org