You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/06/25 04:45:00 UTC
[jira] [Assigned] (SPARK-24484) Power Iteration Clustering is
giving incorrect clustering results when there are mutiple leading eigen
values.
[ https://issues.apache.org/jira/browse/SPARK-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-24484:
------------------------------------
Assignee: (was: Apache Spark)
> Power Iteration Clustering is giving incorrect clustering results when there are mutiple leading eigen values.
> --------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-24484
> URL: https://issues.apache.org/jira/browse/SPARK-24484
> Project: Spark
> Issue Type: Bug
> Components: ML, MLlib
> Affects Versions: 2.4.0
> Reporter: shahid
> Priority: Major
>
> When there are multiple leading eigen values of the normalized affinity matrix, power iteration clustering gives incorrect results.
> We should either give an error or warning to the user when PIC doesn't converges ( ie.
> when |\lambda_1/\lambda_2| = 1 )
> {code:java}
> test("Fail to converge: Multiple leading eigen values") {
> /*
> Graph:
> 2
> /
> /
> 1 3 - - 4
> Adjacency matrix:
> [(0, 1, 0, 0),
> (1, 0, 0, 0),
> A = (0, 0, 0, 1),
> (0, 0, 1, 0)]
> */
> val data = Seq[(Long, Long, Double)](
> (1, 2, 1.0),
> (3, 4, 1.0)
> ).toDF("src", "dst", "weight")
> val result = new PowerIterationClustering()
> .setK(2)
> .setMaxIter(20)
> .setInitMode("random")
> .setWeightCol("weight")
> .assignClusters(data)
> .select('id, 'cluster)
> val predictions = Array.fill(2)(mutable.Set.empty[Long])
> result.collect().foreach {
> case Row(id: Long, cluster: Integer) => predictions(cluster) += id
> }
> assert(predictions.toSet == Set(Array(1, 2).toSet, Array(3, 4).toSet))
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org