You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2017/10/17 00:51:00 UTC

[jira] [Commented] (MADLIB-1129) Additional output information for k-NN

    [ https://issues.apache.org/jira/browse/MADLIB-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206846#comment-16206846 ] 

Frank McQuillan commented on MADLIB-1129:
-----------------------------------------

[~hpandey][~okislal]

Looking at

https://github.com/apache/madlib/commit/a32c01c0b827f23c3955d30210a152ac21773c87
&
https://github.com/apache/madlib/commit/0a7efca73bc7d38a60d92b2d5c196d7c449d9525

If I run the classification example I get

{code}
madlib=# SELECT * from madlib_knn_result_classification ORDER BY id;
 id |  data   | prediction | k_nearest_neighbours 
----+---------+------------+----------------------
  1 | {2,1}   |          1 | {3,1,2}
  2 | {2,6}   |          1 | {3,4,5}
  3 | {15,40} |          0 | {5,6,7}
  4 | {12,1}  |          1 | {3,5,4}
  5 | {2,90}  |          0 | {9,6,7}
  6 | {50,45} |          0 | {6,7,8}
(6 rows)
{code}

but I think the order of the nearest neighbors may be incorrect
e.g., the first row.

The user docs in knn.sql_in is different, it shows:

{code}
id |  data   | prediction | k_nearest_neighbours 
----+---------+------------+----------------------
  1 | {2,1}   |          1 | {1,2,3}
  2 | {2,6}   |          1 | {5,4,3}
  3 | {15,40} |          0 | {7,6,5}
  4 | {12,1}  |          1 | {4,5,3}
  5 | {2,90}  |          0 | {9,6,7}
  6 | {50,45} |          0 | {6,7,8}
(6 rows)
{code}

which looks better but I have not checked every row.

If I do simple tests, the sort order looks OK, but not for the example above, testing on Greenplum.

> Additional output information for k-NN
> --------------------------------------
>
>                 Key: MADLIB-1129
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1129
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: k-NN
>            Reporter: Frank McQuillan
>            Assignee: Himanshu Pandey
>            Priority: Minor
>              Labels: starter
>             Fix For: v1.13
>
>
> Follow on to
> https://issues.apache.org/jira/browse/MADLIB-927
> List the k-nearest neighbors that were used in the voting/averaging, sorted in ASC order according to the distance function used.  This could be added to the current output table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)