You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/02/06 20:34:00 UTC

[jira] [Comment Edited] (MADLIB-1061) Additional computation methods for k-NN - kd tree

    [ https://issues.apache.org/jira/browse/MADLIB-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758517#comment-16758517 ] 

Frank McQuillan edited comment on MADLIB-1061 at 2/6/19 8:33 PM:
-----------------------------------------------------------------

Here is a proposed changes to the knn interface:

{code}
knn( point_source,           -- mandatory params
     point_column_name,
     point_id,
     label_column_name,
     test_source,
     test_column_name,
     test_id,
     output_table,

                              -- optional params
     k,
     output_neighbors,
     fn_dist,
     weighted_avg,
     algorithm,
     algorithm_params
   )


   algorithm (optional)
     TEXT, default: 'brute_force'.  Name of the algorithm used to 
compute the nearest neighbors.  Options are 'brute_force' of 'kd_tree'.
Can also use partial strings like 'b' or 'kd' to select the algorithm.

   algorithm_params (optional)
     TEXT, default 'depth=3, leaf_nodes=2` Comma-delimited list 
of name-value pairs for the kd-tree algorithm.  Specifies the depth 
of the kd-tree and the number of leaf nodes to use, at the specified depth, 
when searching for nearest neighbors for each point.  Both parameters
must be > 0.   The defaults 'depth=3, leaf_nodes=2' means there will be 8 total
leaf nodes (2^3) in the kd-tree, and to search the current node for the 
point plus one other nearest node.
{code}


was (Author: fmcquillan):
Here is a proposed changes to the knn interface:

{code}
knn( point_source,           -- mandatory params
     point_column_name,
     point_id,
     label_column_name,
     test_source,
     test_column_name,
     test_id,
     output_table,

                              -- optional params
     k,
     output_neighbors,
     fn_dist,
     weighted_avg,
     algorithm,
     algorithm_params
   )


   algorithm (optional)
     TEXT, default: 'brute_force'.  Name of the algorithm used to 
compute the nearest neighbors.  Options are 'brute_force' of 'kd_tree'.

   algorithm_params (optional)
     TEXT, default 'depth=3, leaf_nodes=2` Comma-delimited list 
of name-value pairs for the kd-tree algorithm.  Specifies the depth 
of the kd-tree and the number of leaf nodes to use, at the specified depth, 
when searching for nearest neighbors.
{code}

> Additional computation methods for k-NN - kd tree
> -------------------------------------------------
>
>                 Key: MADLIB-1061
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1061
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: k-NN
>            Reporter: Frank McQuillan
>            Assignee: Orhan Kislal
>            Priority: Major
>              Labels: starter
>             Fix For: v1.16
>
>         Attachments: KNN-chart-data.pdf, KNN-charts.pdf, KNN-raw.pdf, KNN-w-KD-tree-leaf-node-only.pdf, Sheet1-KNN-perf-num-features.pdf, Sheet2-KNN-tree-construction.pdf, Sheet3-KNN-tree-depth.pdf
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> which uses brute force.
> Determine other k-NN algos to implement.  From 
> http://scikit-learn.org/stable/modules/neighbors.html
> candidates are:
> * K-D Tree
> * Ball Tree
> * Other?
> This JIRA is to implement K-D tree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)