You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maruf Aytekin (JIRA)" <ji...@apache.org> on 2015/07/22 20:14:05 UTC
[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH)
for MLlib
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637337#comment-14637337 ]
Maruf Aytekin commented on SPARK-5992:
--------------------------------------
In addition to Charikar's scheme for cosine [~karlhigley] pointed out, LSH schemes for the other known similarity/distance measures are as follows:
1. Hamming norm:
A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proc. of the 25th Intl. Conf. on Very Large Data Bases, VLDB(1999).
http://www.cs.princeton.edu/courses/archive/spring13/cos598C/Gionis.pdf
2. Lp norms:
M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proc. of the 20th ACM Annual
http://www.cs.princeton.edu/courses/archive/spring05/cos598E/bib/p253-datar.pdf
http://people.csail.mit.edu/indyk/nips-nn.ps
3. Jaccard distance:
Mining Massive Data Sets chapter#3: http://infolab.stanford.edu/~ullman/mmds/ch3.pdf
4. Cosine distance and Earth movers distance (EMD):
M. Charikar. Similarity Estimation Techniques from Rounding Algorithms. In Proc. of the 34th Annual ACM Symposium on Theory of Computing, STOC (2002).
http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf
> Locality Sensitive Hashing (LSH) for MLlib
> ------------------------------------------
>
> Key: SPARK-5992
> URL: https://issues.apache.org/jira/browse/SPARK-5992
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Affects Versions: 1.4.0
> Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML. It would be great to discuss some possible algorithms here, choose an API, and make a PR for an initial algorithm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org