You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2012/05/31 04:22:19 UTC
RowSimilarityJob
What is the value created to describe simlarity by RowSimilarityJob? The
paper which describes how the algorithm is implemented doesn't describe
the various similarity values returned by mahout. It seems to focus on
cooccurrences.
For SIMILARITY_COSINE is the value = cosine or 1 - cosine?
Is the value calculated after cooccurrences determines similar docs
independently?
The code is very difficult to read so a little help would be appreciated.
Re: RowSimilarityJob
Posted by Suneel Marthi <su...@yahoo.com>.
To answer ur question Pat, for SIMILARITY_COSINE the value returned = cosine.
________________________________
From: Suneel Marthi <su...@yahoo.com>
To: "user@mahout.apache.org" <us...@mahout.apache.org>
Sent: Wednesday, May 30, 2012 11:22 PM
Subject: Re: RowSimilarityJob
Pat,
Here is an example from the output of the rowsimilarity job for a corpus I am working with (using Cosine Similarity).
Key: 25: Value: {27433:0.9999999999999994}
What this means is that Document# 26 is similar to Document# 27433by a factor of 0.999.
Since Distance = (1 - Similarity), this means that the distance between documents 25 and 27433 above is 0 (= 1 - 0.999), or in other words they are very similar.
Hope that clarifies.
Suneel
________________________________
From: Pat Ferrel <pa...@occamsmachete.com>
To: user@mahout.apache.org
Sent: Wednesday, May 30, 2012 10:22 PM
Subject: RowSimilarityJob
What is the value created to describe simlarity by RowSimilarityJob? The paper which describes how the algorithm is implemented doesn't describe the various similarity values returned by mahout. It seems to focus on cooccurrences.
For SIMILARITY_COSINE is the value = cosine or 1 - cosine?
Is the value calculated after cooccurrences determines similar docs independently?
The code is very difficult to read so a little help would be appreciated.
Re: RowSimilarityJob
Posted by Suneel Marthi <su...@yahoo.com>.
Pat,
Here is an example from the output of the rowsimilarity job for a corpus I am working with (using Cosine Similarity).
Key: 25: Value: {27433:0.9999999999999994}
What this means is that Document# 26 is similar to Document# 27433by a factor of 0.999.
Since Distance = (1 - Similarity), this means that the distance between documents 25 and 27433 above is 0 (= 1 - 0.999), or in other words they are very similar.
Hope that clarifies.
Suneel
________________________________
From: Pat Ferrel <pa...@occamsmachete.com>
To: user@mahout.apache.org
Sent: Wednesday, May 30, 2012 10:22 PM
Subject: RowSimilarityJob
What is the value created to describe simlarity by RowSimilarityJob? The paper which describes how the algorithm is implemented doesn't describe the various similarity values returned by mahout. It seems to focus on cooccurrences.
For SIMILARITY_COSINE is the value = cosine or 1 - cosine?
Is the value calculated after cooccurrences determines similar docs independently?
The code is very difficult to read so a little help would be appreciated.