You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Larry Xiao (JIRA)" <ji...@apache.org> on 2014/09/15 04:12:33 UTC
[jira] [Updated] (SPARK-3523) GraphX graph partitioning strategy
[ https://issues.apache.org/jira/browse/SPARK-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Larry Xiao updated SPARK-3523:
------------------------------
Description:
We implemented some algorithms for partitioning on GraphX, and evaluated. And find the partitioning has space of improving. Seek opinion and advice.
h5. Motivation
* Graph in real world follow power law. Eg. On twitter 1% of the vertices are adjacent to nearly half of the edges.
* For high-degree vertex, one vertex concentrates vast resources. So the workload on few high-degree vertex should be decomposed by all machines
* For low-degree vertex, The computation on one vertex is quite small. Thus should exploit the locality of the computation on low-degree vertex.
h5. Algorithm Description
* HybridCut
* !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCut.png
* HybridCutPlus
* !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCutPlus.png
* Greedy BiCut
* a heuristic algorithm for bipartite
h5. Result
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/FactorBalance.png
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Bipartite.png
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Shuffle.png
h5. Code
* https://github.com/larryxiao/spark/blob/GraphX/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L173
* Because the implementation breaks the current separation using PartitionStrategy.scala, so need to think of a way to support access to graph.
h5. Reference
- Bipartite-oriented Distributed Graph Partitioning for Big Learning.
- PowerLyra : Differentiated Graph Computation and Partitioning on Skewed Graphs
was:
We implemented some algorithms for partitioning on GraphX, and evaluated. And find the partitioning has space of improving. Seek opinion and advice.
H5. Motivation
* Graph in real world follow power law. Eg. On twitter 1% of the vertices are adjacent to nearly half of the edges.
* For high-degree vertex, one vertex concentrates vast resources. So the workload on few high-degree vertex should be decomposed by all machines
* For low-degree vertex, The computation on one vertex is quite small. Thus should exploit the locality of the computation on low-degree vertex.
H5. Algorithm Description
* HybridCut
* !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCut.png
* HybridCutPlus
* !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCutPlus.png
* Greedy BiCut
* a heuristic algorithm for bipartite
H5. Result
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/FactorBalance.png
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Bipartite.png
!https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Shuffle.png
H5. Code
* https://github.com/larryxiao/spark/blob/GraphX/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L173
* Because the implementation breaks the current separation using PartitionStrategy.scala, so need to think of a way to support access to graph.
H5. Reference
- Bipartite-oriented Distributed Graph Partitioning for Big Learning.
- PowerLyra : Differentiated Graph Computation and Partitioning on Skewed Graphs
> GraphX graph partitioning strategy
> ----------------------------------
>
> Key: SPARK-3523
> URL: https://issues.apache.org/jira/browse/SPARK-3523
> Project: Spark
> Issue Type: Improvement
> Components: GraphX
> Affects Versions: 1.0.2
> Reporter: Larry Xiao
>
> We implemented some algorithms for partitioning on GraphX, and evaluated. And find the partitioning has space of improving. Seek opinion and advice.
> h5. Motivation
> * Graph in real world follow power law. Eg. On twitter 1% of the vertices are adjacent to nearly half of the edges.
> * For high-degree vertex, one vertex concentrates vast resources. So the workload on few high-degree vertex should be decomposed by all machines
> * For low-degree vertex, The computation on one vertex is quite small. Thus should exploit the locality of the computation on low-degree vertex.
> h5. Algorithm Description
> * HybridCut
> * !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCut.png
> * HybridCutPlus
> * !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/HybridCutPlus.png
> * Greedy BiCut
> * a heuristic algorithm for bipartite
> h5. Result
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/FactorBalance.png
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Bipartite.png
> !https://raw.githubusercontent.com/larryxiao/spark/GraphX/Arkansol.Analyse/Shuffle.png
> h5. Code
> * https://github.com/larryxiao/spark/blob/GraphX/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L173
> * Because the implementation breaks the current separation using PartitionStrategy.scala, so need to think of a way to support access to graph.
> h5. Reference
> - Bipartite-oriented Distributed Graph Partitioning for Big Learning.
> - PowerLyra : Differentiated Graph Computation and Partitioning on Skewed Graphs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org