You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "nguyen duc tuan (JIRA)" <ji...@apache.org> on 2017/08/23 05:33:00 UTC
[jira] [Created] (SPARK-21815) Undeterministic group labeling
within small connected component
nguyen duc tuan created SPARK-21815:
---------------------------------------
Summary: Undeterministic group labeling within small connected component
Key: SPARK-21815
URL: https://issues.apache.org/jira/browse/SPARK-21815
Project: Spark
Issue Type: Improvement
Components: GraphX
Affects Versions: 2.2.0, 1.6.3
Reporter: nguyen duc tuan
Priority: Trivial
As I look in the code https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala#L61, when the number of vertices in each community is small and the number of iteration is large enough, all candidates will have same scores. Due to order in the set, each vertex will be assigned to different community id. By ordering vertexId, the problem solved.
Sample code to reproduce this error:
val vertices = spark.sparkContext.parallelize(Seq((1l,1), (2l, 1)))
val edges = spark.sparkContext.parallelize(Seq(Edge(1l,2l, 1))
val c =LabelPropagation.run(g, 5)
c.vertices.map(x => (x._1, x._2)).toDF.show
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org