You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/03/10 11:52:40 UTC

[jira] [Resolved] (SPARK-13714) Another ConnectedComponents based on Max-Degree Propagation

     [ https://issues.apache.org/jira/browse/SPARK-13714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-13714.
-------------------------------
    Resolution: Won't Fix

I'm going to call this WontFix unless somebody advocates strongly for actually *replacing* the current implementation with this one. More realistic is to try to merge them. Adding a second one doesn't seem great.

> Another ConnectedComponents based on Max-Degree Propagation
> -----------------------------------------------------------
>
>                 Key: SPARK-13714
>                 URL: https://issues.apache.org/jira/browse/SPARK-13714
>             Project: Spark
>          Issue Type: New Feature
>          Components: GraphX
>            Reporter: zhengruifeng
>            Priority: Minor
>
> Current ConnectedComponents algorithm was based on Min-VertexId Propagation, which is sensitive to the place of Min-VertexId.
> While this implementation is based on Max-Degree Propagation.
> First, the degree graph is computed. And in the pregel progress, the vertex with the max degree in a CC is the start point of propagation.
> This new method has advantages over the old one:
> 1, The convergence is only determined by the structs of CC, and is robust to the place of vertex with Min-ID.
> 2, For spherical CCs in which there may be a concept like 'center', it can accelerate the convergence. For example, GraphGenerators.gridGraph(sc, 3, 3), the old CC need 4 supersteps, while the new one only need 2 supersteps.
> 3, If we limit the number of iteration, the new method tend to generate more acceptable results.
> 4, The output for each CC is the vertex with max degree in it, which may be more meaningful. And because the vertex-ID is nominal in most cases, the vertex with min-ID in a CC is somewhat meanless.
> But there are still two disadvantages:
> 1,The message body grows, from (VID) to (VID, Degree). that is (Long) -> (Long, Int)
> 2,For graph with simple CCs, it may be slower than old one. Because it need a extra degree computation.
> The api is the same like ConnectedComponents:
> val graph = ...
> val cc = graph.ConnectedComponentsWithDegree(100)
> or
> val cc = ConnectedComponentsWithDegree.run(graph, 100)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org