You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arthur Chan <ar...@gmail.com> on 2015/10/15 18:57:53 UTC

word2vec cosineSimilarity

Hi,

I am trying sample word2vec  from
http://spark.apache.org/docs/latest/mllib-feature-extraction.html#example

Following are my test results:

scala> for((synonym, cosineSimilarity) <- synonyms) {
     |   println(s"$synonym $cosineSimilarity")
     | }
taiwan 2.0518918365726297
japan 1.8960962308732054
korea 1.8789320149319788
thailand 1.7549218525671182
mongolia 1.7375501108635814


I got the values cosineSimilarity are all greater than 1,  should the
cosineSimilarity be the values between 0 to 1?

How can I get the values of Similarity in 0 to 1?

Regards