You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Ashutosh Trivedi (JIRA)" <ji...@apache.org> on 2014/10/27 06:15:33 UTC

[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib

    [ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184848#comment-14184848 ] 

Ashutosh Trivedi commented on SPARK-2336:
-----------------------------------------

Is anybody already working on it ? I can take up this task. We can also implement KNN joins which will  be a nice utility for data mining.

Here is the link for KNN-joins
http://ww2.cs.fsu.edu/~czhang/knnjedbt/

> Approximate k-NN Models for MLLib
> ---------------------------------
>
>                 Key: SPARK-2336
>                 URL: https://issues.apache.org/jira/browse/SPARK-2336
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Brian Gawalt
>            Priority: Minor
>              Labels: features, newbie
>
> After tackling the general k-Nearest Neighbor model as per https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to also offer approximate k-Nearest Neighbor. A promising approach would involve building a kd-tree variant within from each partition, a la
> http://www.autonlab.org/autonweb/14714.html?branch=1&language=2
> This could offer a simple non-linear ML model that can label new data with much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org