You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@madlib.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/01 00:49:51 UTC

[jira] [Commented] (MADLIB-927) Initial implementation of k-NN

    [ https://issues.apache.org/jira/browse/MADLIB-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847815#comment-15847815 ] 

ASF GitHub Bot commented on MADLIB-927:
---------------------------------------

Github user auonhaidar commented on the issue:

    https://github.com/apache/incubator-madlib/pull/81
  
    Hi NJ, Orhan
    I am done with adding following validation cases:
    
    - Check if train and test table is valid
    - if columns specified are present in these tables
    - if k>0 or not
    - if k<= number of rows in train table or not
    - Are feature column of array type or not
    - Are NULL values present in these feature columns or not
    - Is Id column of test table integer or not
    - Is label valid (float, integer, boolean) or not
    
    
    I will be committing these changes tomorrow.
    Please suggest if I am leaving anything.
    
    
    
    Auon
    
    



> Initial implementation of k-NN
> ------------------------------
>
>                 Key: MADLIB-927
>                 URL: https://issues.apache.org/jira/browse/MADLIB-927
>             Project: Apache MADlib
>          Issue Type: New Feature
>            Reporter: Rahul Iyer
>              Labels: starter
>             Fix For: v1.10
>
>
> k-Nearest Neighbors is a simple algorithm based on finding nearest neighbors of data points in a metric feature space according to a specified distance function. It is considered one of the canonical algorithms of data science. It is a nonparametric method, which makes it applicable to a lot of real-world problems where the data doesn’t satisfy particular distribution assumptions. It can also be implemented as a lazy algorithm, which means there is no training phase where information in the data is condensed into coefficients, but there is a costly testing phase where all data (or some subset) is used to make predictions.
> This JIRA involves implementing the naïve approach - i.e. compute the k nearest neighbors by going through all points.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)