You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "spark_user (JIRA)" <ji...@apache.org> on 2018/05/09 18:21:00 UTC

[jira] [Comment Edited] (SPARK-24217) Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.

    [ https://issues.apache.org/jira/browse/SPARK-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469243#comment-16469243 ] 

spark_user edited comment on SPARK-24217 at 5/9/18 6:20 PM:
------------------------------------------------------------

Thanks for the comment Joseph K. Bradley.

Actually the issue is not about the symmetric similarity matrix.  Spark.mllib PIC assigns cluster indices corresponding to all the vertices of the similarity graph. But spark.ml doesn't return the cluster ids of the vertices which are not there in the "id" column.

This can be clearly visible in the test cases of both spark.ml and spark.mllib


was (Author: shahid):
Thanks for the comment Joseph K. Bradley.

Actually the issue is not about the symmetric similarity matrix.  Spark.mllib PIC assigns cluster indices corresponding to all the vertices of the similarity graph. But spark.ml doesn't return the cluster ids of the vertices which are not there in the ID column.

This can be clearly visible in the test cases of both spark.ml and spark.mllib

> Power Iteration Clustering is not displaying cluster indices corresponding to some vertices.
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24217
>                 URL: https://issues.apache.org/jira/browse/SPARK-24217
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: spark_user
>            Priority: Major
>             Fix For: 2.4.0
>
>
> We should display prediction and id corresponding to all the nodes. 
> As per the definition of PIC clustering, given in the code,
> PIC takes an affinity matrix between items (or vertices) as input. An affinity matrix
>  is a symmetric matrix whose entries are non-negative similarities between items.
>  PIC takes this matrix (or graph) as an adjacency matrix. Specifically, each input row includes:
>  * {{idCol}}: vertex ID
>  * {{neighborsCol}}: neighbors of vertex in {{idCol}}
>  * {{similaritiesCol}}: non-negative weights (similarities) of edges between the vertex
>  in {{idCol}} and each neighbor in {{neighborsCol}}
>  * *"PIC returns a cluster assignment for each input vertex."* It appends a new column {{predictionCol}}
>  containing the cluster assignment in {{[0,k)}} for each row (vertex).
>  Currently PIC will not return the cluster indices of neighbour IDs which are not there in the ID column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org