You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "nguyen duc tuan (JIRA)" <ji...@apache.org> on 2017/08/23 05:33:00 UTC

[jira] [Created] (SPARK-21815) Undeterministic group labeling within small connected component

nguyen duc tuan created SPARK-21815:
---------------------------------------

             Summary: Undeterministic  group labeling within small connected component
                 Key: SPARK-21815
                 URL: https://issues.apache.org/jira/browse/SPARK-21815
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
    Affects Versions: 2.2.0, 1.6.3
            Reporter: nguyen duc tuan
            Priority: Trivial


As I look in the code https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala#L61, when the number of vertices in each community is small and the number of iteration is large enough, all candidates will have same scores. Due to order in the set, each vertex will be assigned to  different community id. By ordering vertexId, the problem solved.

Sample code to reproduce this error:
val vertices = spark.sparkContext.parallelize(Seq((1l,1), (2l, 1)))
val edges = spark.sparkContext.parallelize(Seq(Edge(1l,2l, 1))
val c =LabelPropagation.run(g, 5)
c.vertices.map(x => (x._1, x._2)).toDF.show



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org