You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Donni Khan <pr...@googlemail.com> on 2017/11/27 12:27:21 UTC

Cosine Similarity between documents - Rows

I have spark job to compute the similarity between text documents:

RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd());
CoordinateMatrix
rowsimilarity=rowMatrix.columnSimilarities(0.5);JavaRDD<MatrixEntry>
entries = rowsimilarity.entries().toJavaRDD();
List<MatrixEntry> list = entries.collect();
for(MatrixEntry s : list) System.out.println(s);

the MatrixEntry(i, j, value) represents the similarity between
columns(let's say the features of documents).
But how can I show the similarity between rows?
suppose I have five documents Doc1,.... Doc5, We would like to show the
similarity between all those documnts.
 How do I get that? any help?

Thank you
Donni

Re: Cosine Similarity between documents - Rows

Posted by "Ge, Yao (Y.)" <yg...@ford.com>.
You are essential doing document clustering. K-means will do it. You do have to specify the number of clusters up front.

Sent from Email+ secured by MobileIron


________________________________

From: "Donni Khan" <pr...@googlemail.com>>
Date: Monday, November 27, 2017 at 7:27:33 AM
To: "user@spark.apache.org" <us...@spark.apache.org>>
Subject: Cosine Similarity between documents - Rows


I have spark job to compute the similarity between text documents:

RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd());
CoordinateMatrix  rowsimilarity=rowMatrix.columnSimilarities(0.5);
JavaRDD<MatrixEntry> entries = rowsimilarity.entries().toJavaRDD();

List<MatrixEntry> list = entries.collect();

for(MatrixEntry s : list) System.out.println(s);

the MatrixEntry(i, j, value) represents the similarity between columns(let's say the features of documents).
But how can I show the similarity between rows?
suppose I have five documents Doc1,.... Doc5, We would like to show the similarity between all those documnts.
 How do I get that? any help?

Thank you
Donni