You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yun Ni (JIRA)" <ji...@apache.org> on 2016/11/07 21:16:59 UTC

[jira] [Created] (SPARK-18334) MinHash should use binary hash distance

Yun Ni created SPARK-18334:
------------------------------

             Summary: MinHash should use binary hash distance
                 Key: SPARK-18334
                 URL: https://issues.apache.org/jira/browse/SPARK-18334
             Project: Spark
          Issue Type: Bug
            Reporter: Yun Ni
            Priority: Trivial


MinHash currently is using the same `hashDistance` function as RandomProjection. This does not make sense for MinHash because the Jaccard distance of two sets is not relevant to the absolute distance of their hash buckets indices.

This bug could affect accuracy of multi probing NN search for MinHash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org