You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/01/09 22:18:58 UTC
[jira] [Commented] (MADLIB-927) Initial implementation of k-NN
[ https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813020#comment-15813020 ]
ASF GitHub Bot commented on MADLIB-927:
---------------------------------------
Github user orhankislal commented on the issue:
https://github.com/apache/incubator-madlib/pull/81
When I run the install-check I get the following error.
```
select * from madlib.knn('knn_train_data','data','label','knn_test_data','data','id','madlib_knn_result_classification','c',3);
psql:/tmp/madlib.vuBBXN/knn/test/knn.sql_in.tmp:137: ERROR: mode is not an ordered-set aggregate, so it cannot have WITHIN GROUP
LINE 3: select test_id as id, data, mode() within group(order by la...
^
QUERY:
CREATE TABLE madlib_knn_result_classification AS
select test_id as id, data, mode() within group(order by label) as predLabel from pg_temp.madlib_knn_interm join knn_test_data on test_id=id group by test_id, data
CONTEXT: PL/pgSQL function knn(character varying,character varying,character varying,character varying,character varying,character varying,character varying,character varying,integer) line 44 at EXECUTE statement
```
Have you encountered this before? I am using postgres 9.4.
> Initial implementation of k-NN
> ------------------------------
>
> Key: MADLIB-927
> URL: https://issues.apache.org/jira/browse/MADLIB-927
> Project: Apache MADlib
> Issue Type: New Feature
> Reporter: Rahul Iyer
> Labels: gsoc2016, starter
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors of data points in a metric feature space according to a specified distance function. It is considered one of the canonical algorithms of data science. It is a nonparametric method, which makes it applicable to a lot of real-world problems where the data doesn’t satisfy particular distribution assumptions. It can also be implemented as a lazy algorithm, which means there is no training phase where information in the data is condensed into coefficients, but there is a costly testing phase where all data (or some subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k nearest neighbors by going through all points.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)