You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Özgür Demir (JIRA)" <ji...@apache.org> on 2015/09/23 15:24:04 UTC

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

    [ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904500#comment-14904500 ] 

Özgür Demir commented on SPARK-5992:
------------------------------------

hi, we just open sourced an lsh topk implementation for spark. it's an implementation for cosine similarity only. for search and recommendation tasks this is the most used similarity metric as it normalizes popularity. 

influenced by the 'dimsum' implementation its input is a row based matrix where each row represents one item. The output is a similarity matrix. This allowed us to easily switch between both implementations. 

this might be the most basic interface for other lsh join (all pair) implementations?

you'll find the code here:

https://github.com/soundcloud/cosine-lsh-join-spark

cheers

demir

> Locality Sensitive Hashing (LSH) for MLlib
> ------------------------------------------
>
>                 Key: SPARK-5992
>                 URL: https://issues.apache.org/jira/browse/SPARK-5992
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML.  It would be great to discuss some possible algorithms here, choose an API, and make a PR for an initial algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org