You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "nguyen duc tuan (JIRA)" <ji...@apache.org> on 2017/08/23 06:53:00 UTC

[jira] [Updated] (SPARK-21815) Undeterministic group labeling within small connected component

     [ https://issues.apache.org/jira/browse/SPARK-21815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nguyen duc tuan updated SPARK-21815:
------------------------------------
    Description: 
As I look in the code https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala#L61, when the number of vertices in each community is small and the number of iteration is large enough, all candidates will have same scores. Due to order in the set, each vertex will be assigned to  different community id. By ordering vertexId, the problem solved.

Sample code to reproduce this error:
val vertices = spark.sparkContext.parallelize(Seq((1l,1), (2l, 1)))
val edges = spark.sparkContext.parallelize(Seq(Edge(1l,2l, 1))
val g = Graph(vertices, edges)
val c =LabelPropagation.run(g, 5)
c.vertices.map(x => (x._1, x._2)).toDF.show

  was:
As I look in the code https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala#L61, when the number of vertices in each community is small and the number of iteration is large enough, all candidates will have same scores. Due to order in the set, each vertex will be assigned to  different community id. By ordering vertexId, the problem solved.

Sample code to reproduce this error:
val vertices = spark.sparkContext.parallelize(Seq((1l,1), (2l, 1)))
val edges = spark.sparkContext.parallelize(Seq(Edge(1l,2l, 1))
val c =LabelPropagation.run(g, 5)
c.vertices.map(x => (x._1, x._2)).toDF.show


> Undeterministic  group labeling within small connected component
> ----------------------------------------------------------------
>
>                 Key: SPARK-21815
>                 URL: https://issues.apache.org/jira/browse/SPARK-21815
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>    Affects Versions: 1.6.3, 2.2.0
>            Reporter: nguyen duc tuan
>            Priority: Trivial
>              Labels: easyfix
>
> As I look in the code https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala#L61, when the number of vertices in each community is small and the number of iteration is large enough, all candidates will have same scores. Due to order in the set, each vertex will be assigned to  different community id. By ordering vertexId, the problem solved.
> Sample code to reproduce this error:
> val vertices = spark.sparkContext.parallelize(Seq((1l,1), (2l, 1)))
> val edges = spark.sparkContext.parallelize(Seq(Edge(1l,2l, 1))
> val g = Graph(vertices, edges)
> val c =LabelPropagation.run(g, 5)
> c.vertices.map(x => (x._1, x._2)).toDF.show



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org