You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nick Pentreath (JIRA)" <ji...@apache.org> on 2017/02/21 08:01:44 UTC

[jira] [Comment Edited] (SPARK-18454) Changes to improve Nearest Neighbor Search for LSH

    [ https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875552#comment-15875552 ] 

Nick Pentreath edited comment on SPARK-18454 at 2/21/17 8:00 AM:
-----------------------------------------------------------------

Can you also comment on http://mail-archives.apache.org/mod_mbox/spark-user/201702.mbox/%3CCANxMKZU0iVd9Ff4TrWjtdk%3DkEyXAeoXGLEgmVW5vbE5puobE6Q%40mail.gmail.com%3E? It would be good to understand why we're seeing poor performance vs an alternative impl in Spark packages, and whether we can take some idea from that on how to improve performance.

Though it's true it does not support similarity join. Still we should investigate.


was (Author: mlnick):
Can you also comment on http://mail-archives.apache.org/mod_mbox/spark-user/201702.mbox/%3CCANxMKZU0iVd9Ff4TrWjtdk%3DkEyXAeoXGLEgmVW5vbE5puobE6Q%40mail.gmail.com%3E? It would be good to understand why we're seeing poor performance vs an alternative impl in Spark packages, and whether we can take some idea from that on how to improve performance.

> Changes to improve Nearest Neighbor Search for LSH
> --------------------------------------------------
>
>                 Key: SPARK-18454
>                 URL: https://issues.apache.org/jira/browse/SPARK-18454
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yun Ni
>
> We all agree to do the following improvement to Multi-Probe NN Search:
> (1) Use approxQuantile to get the {{hashDistance}} threshold instead of doing full sort on the whole dataset
> Currently we are still discussing the following:
> (1) What {{hashDistance}} (or Probing Sequence) we should use for {{MinHash}}
> (2) What are the issues and how we should change the current Nearest Neighbor implementation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org