You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by auonhaidar <gi...@git.apache.org> on 2016/12/13 02:27:51 UTC

[GitHub] incubator-madlib pull request #80: KNN Added

GitHub user auonhaidar opened a pull request:

    https://github.com/apache/incubator-madlib/pull/80

    KNN Added

    I have added implementation of K-Nearest Neighbours. The function takes following arguments in order:
    _training table name
    training table's column name having vectors
    training table's column name having target values(labels): assumed float
    testing table name
    testing table's column name having vectors
    testing table's column name having row ids (used for final result's table creation)
    Operation ('c' for classification, 'r' for regression)
    K_
    
    
    **sample run:** 
    select * from madlib.knn('training_knn','points','label','testing_knn','points','pid','c',3);
    
    **Output gets saved in knn_final table as:**
    
    _pid, points, predicted_target_value_
    
    
    Nearest neighbours are calculated using madlib's _squared norm_ metric.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/auonhaidar/incubator-madlib master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/80.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #80
    
----
commit 23b418cf9155aa74d388d67fcec184902570869c
Author: auonhaidar <ka...@gmail.com>
Date:   2016-12-13T02:09:12Z

    KNN Added

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-madlib pull request #80: KNN Added

Posted by "Kazmi,Auon H" <ak...@ufl.edu>.
Hi NJ,

Thanks for your inputs!

I have created a PR after I made some suggested changes in the existing KNN code.




Regards,

Auon

________________________________
From: Nandish Jayaram <nj...@pivotal.io>
Sent: Monday, December 19, 2016 12:58:59 PM
To: dev@madlib.incubator.apache.org
Subject: Re: [GitHub] incubator-madlib pull request #80: KNN Added

Hi Auon,

You don't have to add the JIRA, it is already there:
https://issues.apache.org/jira/browse/MADLIB-927

It's just good practice to mention the JIRA ID (JIRA: MADLIB-927) in your
commit message.

NJ

On Sun, Dec 18, 2016 at 8:13 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> Hi NJ,
>
> I am ready to add changes made in KNN to my repo. Please tell where to add
> JIRA.
>
>
>
>
> Regards,
>
> Auon
>
> ________________________________
> From: auonhaidar <gi...@git.apache.org>
> Sent: Saturday, December 17, 2016 1:07:48 AM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib pull request #80: KNN Added
>
> Github user auonhaidar closed the pull request at:
>
>     https://github.com/apache/incubator-madlib/pull/80
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

Re: [GitHub] incubator-madlib pull request #80: KNN Added

Posted by Nandish Jayaram <nj...@pivotal.io>.
Hi Auon,

You don't have to add the JIRA, it is already there:
https://issues.apache.org/jira/browse/MADLIB-927

It's just good practice to mention the JIRA ID (JIRA: MADLIB-927) in your
commit message.

NJ

On Sun, Dec 18, 2016 at 8:13 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> Hi NJ,
>
> I am ready to add changes made in KNN to my repo. Please tell where to add
> JIRA.
>
>
>
>
> Regards,
>
> Auon
>
> ________________________________
> From: auonhaidar <gi...@git.apache.org>
> Sent: Saturday, December 17, 2016 1:07:48 AM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib pull request #80: KNN Added
>
> Github user auonhaidar closed the pull request at:
>
>     https://github.com/apache/incubator-madlib/pull/80
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

Re: [GitHub] incubator-madlib pull request #80: KNN Added

Posted by "Kazmi,Auon H" <ak...@ufl.edu>.
Hi NJ,

I am ready to add changes made in KNN to my repo. Please tell where to add JIRA.




Regards,

Auon

________________________________
From: auonhaidar <gi...@git.apache.org>
Sent: Saturday, December 17, 2016 1:07:48 AM
To: dev@madlib.incubator.apache.org
Subject: [GitHub] incubator-madlib pull request #80: KNN Added

Github user auonhaidar closed the pull request at:

    https://github.com/apache/incubator-madlib/pull/80


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by auonhaidar <gi...@git.apache.org>.
Github user auonhaidar closed the pull request at:

    https://github.com/apache/incubator-madlib/pull/80


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/80#discussion_r92718644
  
    --- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
    @@ -0,0 +1,126 @@
    +/* ----------------------------------------------------------------------- *//**
    + *
    + * @file knn.sql_in
    + *
    + * @brief Set of functions for k-nearest neighbors.
    + *
    + *
    + *//* ----------------------------------------------------------------------- */
    +
    +m4_include(`SQLCommon.m4')
    +
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.knn_result CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.knn_result AS (
    +    prediction float
    +);
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.test_table_spec CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.test_table_spec AS (
    +    id integer,
    +    vector DOUBLE PRECISION[]
    +);
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__knn_validate_src(
    +rel_source VARCHAR
    +) RETURNS VOID AS $$
    +    PythonFunction(knn, knn, knn_validate_src)
    +$$ LANGUAGE plpythonu
    +m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
    +
    --- End diff --
    
    I believe this is just the first version of this code. But
    it is a good idea to have some documentation that
    would be helpful for users. Try to include a function
    that would show the help messages when the user just
    calls `select madlib.knn()` or `select madlib.knn('help')`.
    
    It'll also be awesome to include some small example data
    that can be used. For instance, include a small train and
    test tables. Functions that display these help messages
    are typically written in python. Check out other modules
    to get some ideas for the same.
    
    Fyi, I created the following train and test tables locally
    when I tried to run this code:
    ```sql
    drop table if exists knn_train_data;
    create table knn_train_data (
        id  integer,
        data    integer[],
        label   float);
    copy knn_train_data (id, data, label) from stdin delimiter '|';
    1|{1,1}|1.0
    2|{2,2}|1.0
    3|{3,3}|1.0
    4|{4,4}|1.0
    5|{4,5}|1.0
    6|{20,50}|0.0
    7|{10,31}|0.0
    8|{81,13}|0.0
    9|{1,111}|0.0
    \.
    
    drop table if exists knn_test_data;
    create table knn_test_data (
        id  integer,
        data    integer[]);
    copy knn_test_data (id, data) from stdin delimiter '|';
    1|{2,1}
    2|{2,6}
    3|{15,40}
    4|{12,1}
    5|{2,90}
    6|{50,45}
    \.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/80#discussion_r92717819
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -0,0 +1,36 @@
    +# coding=utf-8
    +m4_changequote(`<!', `!>')
    +
    +"""
    +@file knn.py_in
    +
    +@brief knn: Driver functions
    +
    +@namespace knn
    +
    +@brief knn: Driver functions
    +"""
    +
    +import plpy
    +
    +#from utilities.control import IterationController2D
    +#from utilities.control_composite import IterationControllerComposite
    +from utilities.validate_args import table_exists
    +from utilities.validate_args import table_is_empty
    +
    +STATE_IN_MEM = m4_ifdef(<!__HAWQ__!>, <!True!>, <!False!>)
    +HAS_FUNCTION_PROPERTIES = m4_ifdef(<!__HAS_FUNCTION_PROPERTIES__!>, <!True!>, <!False!>)
    +UDF_ON_SEGMENT_NOT_ALLOWED = m4_ifdef(<!__UDF_ON_SEGMENT_NOT_ALLOWED__!>, <!True!>, <!False!>)
    +# ----------------------------------------------------------------------
    +
    +
    +def knn_validate_src(schema_madlib, rel_source, **kwargs):
    +    if rel_source is None or rel_source.strip().lower() in ('null', ''):
    --- End diff --
    
    I guess you can replace this with `if not rel_source:`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by Wang ChenLiang <hi...@msn.com>.
Hi NJ,

  Your instructions is great for a beginning developer. Is it possible
to add these or an extended edition including examples and details to
the guide for developers?

Best,
Chenliang Wang

On 12/17/2016 09:32 AM, Nandish Jayaram wrote:
> Hi Auon,
> 
> Hope your exams went well.
> 
> You can do whatever ends up being a better git-learning experience for you.
> Since you just started contributing to MADlib, the easier way to get going
> might be to do what you mentioned. But a better, though a longer way, would
> be to just mess around with branches as a learning experience. For instance
> (be warned, this might not be the best approach and it might sound
> daunting), you can do the following:
> - Create a new local branch (say the branch name is temp-features/knn)
> while on your current master branch (which already has the knn code changes
> in it).
> useful commands: git checkout -b temp-features/knn
> - Go back to your master branch and reset it to the commit SHA before you
> made changes for knn (look at git log command to find the appropriate
> commit SHA).
> useful commands: git log, git reset --hard <commit SHA> (be careful while
> using the --hard flag in general).
> - You essentially want to reach a state where the new branch features/knn
> has the code changes you have made so far for the knn feature, and your
> master branch must be in sync with apache/incubator-madlib's master branch.
> You ideally want your local master to be in sync with your repo master,
> which in turn must be in sync with origin master (apache/incubator-madlib).
> - You might also want to push your master (with --force option) to your
> remote repo, to undo the changes you have made to your repo master branch
> with the previous PR.
> useful commands: git push --force <your repo>
> - Now create a new branch off your master (say branch name features/knn).
> Rebase this new branch with the temp-features/knn branch. You will get the
> knn related changes back on this branch now.
> useful commands: git checkout -b features/knn, git rebase temp-features/knn
> - Address the comments on this PR, and then push the features/knn branch to
> your repo and open a new PR on the branch. Read about git rebase (and try
> using it) before pushing the branch.
> useful commands: (on master branch), git pull --ff-only, (on features/knn
> branch) git rebase -i master
> 
> The useful commands I have mentioned might not do the needful for each
> step. They are just pointers for you. There might be a much more simpler
> way to accomplish all this, and I have no idea if this way would actually
> work correctly. :) But you can (almost) always recover from any mistake you
> make on git.
> 
> NJ
> 
> On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
> 
>> HI NJ,
>>
>> Thanks for your input!
>>
>> Sorry, I was busy with my semester-end exams.
>>
>> I am reading on Git. Should I repeat the process of checking out madlib
>> repo and then again making changes in a local branch?
>>
>>
>>
>> Regards,
>>
>> Auon
>>
>> ________________________________
>> From: njayaram2 <gi...@git.apache.org>
>> Sent: Thursday, December 15, 2016 6:24:08 PM
>> To: dev@madlib.incubator.apache.org
>> Subject: [GitHub] incubator-madlib issue #80: KNN Added
>>
>> Github user njayaram2 commented on the issue:
>>
>>     https://github.com/apache/incubator-madlib/pull/80
>>
>>     This is a great start!
>>     I will provide some github-specific feedback here, and more
>> knn-specific
>>     comments in the code.
>>     Git can be daunting to use at first, but it's great once you get a
>> hang of it.
>>     I would recommend you go through the following wonderful book if you
>>     have not already done so:
>>     https://git-scm.com/book/en/v2
>>
>>     When you work on a feature/bug, it is best if you create a branch
>> locally
>>     and make all changes for that feature there. You can then push that
>> branch
>>     into your github repo and open a pull request. This way you won't mess
>> with
>>     your local master branch, which should ideally be in sync with the
>> origin's
>>     (apache/incubator-madlib in this case) master branch. More information
>> on
>>     how to work with branches can be found in the following chapter:
>>     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
>>     (especially section 3.5)
>>
>>     One other minor feedback is to try including the corresponding JIRA id
>>     with the commit message. The JIRA associated with this feature is:
>>     https://issues.apache.org/jira/browse/MADLIB-927
>>
>>
>> ---
>> If your project is set up for it, you can reply to this email and have your
>> reply appear on GitHub as well. If your project does not have this feature
>> enabled and wishes so, or if the feature is enabled but not working, please
>> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>> with INFRA.
>> ---
>>
> 

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by Rahul Iyer <ra...@gmail.com>.
Hi Auon,

It looks like the PR was closed from your end. If you didn't close it on
github, then it could have been closed if that branch was deleted.
In any case, it's good that you have all the changes you need to make. If
you need, you can still access the PR with the comments on github
<https://github.com/apache/incubator-madlib/pull/80>.

Best,
Rahul

On Fri, Dec 16, 2016 at 10:52 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> Hi NJ,
>
> I guess I was able to play around with branching and other stuff but my PR
> got deleted from madlib's repo. But that's okay as I have got the comments
> you made, in e-mails. I will work on them from tomorrow.
>
>
> Thanks for your help!
>
>
> Thanks,
>
> Auon
>
> ________________________________
> From: Kazmi,Auon H <ak...@ufl.edu>
> Sent: Friday, December 16, 2016 11:09:11 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi NJ,
>
> Thanks for your detailed reply!
>
> I will try to do the said things.
>
>
>
> Thanks,
>
> Auon
>
> ________________________________
> From: Nandish Jayaram <nj...@pivotal.io>
> Sent: Friday, December 16, 2016 8:32:52 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi Auon,
>
> Hope your exams went well.
>
> You can do whatever ends up being a better git-learning experience for you.
> Since you just started contributing to MADlib, the easier way to get going
> might be to do what you mentioned. But a better, though a longer way, would
> be to just mess around with branches as a learning experience. For instance
> (be warned, this might not be the best approach and it might sound
> daunting), you can do the following:
> - Create a new local branch (say the branch name is temp-features/knn)
> while on your current master branch (which already has the knn code changes
> in it).
> useful commands: git checkout -b temp-features/knn
> - Go back to your master branch and reset it to the commit SHA before you
> made changes for knn (look at git log command to find the appropriate
> commit SHA).
> useful commands: git log, git reset --hard <commit SHA> (be careful while
> using the --hard flag in general).
> - You essentially want to reach a state where the new branch features/knn
> has the code changes you have made so far for the knn feature, and your
> master branch must be in sync with apache/incubator-madlib's master branch.
> You ideally want your local master to be in sync with your repo master,
> which in turn must be in sync with origin master (apache/incubator-madlib).
> - You might also want to push your master (with --force option) to your
> remote repo, to undo the changes you have made to your repo master branch
> with the previous PR.
> useful commands: git push --force <your repo>
> - Now create a new branch off your master (say branch name features/knn).
> Rebase this new branch with the temp-features/knn branch. You will get the
> knn related changes back on this branch now.
> useful commands: git checkout -b features/knn, git rebase temp-features/knn
> - Address the comments on this PR, and then push the features/knn branch to
> your repo and open a new PR on the branch. Read about git rebase (and try
> using it) before pushing the branch.
> useful commands: (on master branch), git pull --ff-only, (on features/knn
> branch) git rebase -i master
>
> The useful commands I have mentioned might not do the needful for each
> step. They are just pointers for you. There might be a much more simpler
> way to accomplish all this, and I have no idea if this way would actually
> work correctly. :) But you can (almost) always recover from any mistake you
> make on git.
>
> NJ
>
> On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
>
> > HI NJ,
> >
> > Thanks for your input!
> >
> > Sorry, I was busy with my semester-end exams.
> >
> > I am reading on Git. Should I repeat the process of checking out madlib
> > repo and then again making changes in a local branch?
> >
> >
> >
> > Regards,
> >
> > Auon
> >
> > ________________________________
> > From: njayaram2 <gi...@git.apache.org>
> > Sent: Thursday, December 15, 2016 6:24:08 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: [GitHub] incubator-madlib issue #80: KNN Added
> >
> > Github user njayaram2 commented on the issue:
> >
> >     https://github.com/apache/incubator-madlib/pull/80
> >
> >     This is a great start!
> >     I will provide some github-specific feedback here, and more
> > knn-specific
> >     comments in the code.
> >     Git can be daunting to use at first, but it's great once you get a
> > hang of it.
> >     I would recommend you go through the following wonderful book if you
> >     have not already done so:
> >     https://git-scm.com/book/en/v2
> >
> >     When you work on a feature/bug, it is best if you create a branch
> > locally
> >     and make all changes for that feature there. You can then push that
> > branch
> >     into your github repo and open a pull request. This way you won't
> mess
> > with
> >     your local master branch, which should ideally be in sync with the
> > origin's
> >     (apache/incubator-madlib in this case) master branch. More
> information
> > on
> >     how to work with branches can be found in the following chapter:
> >     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
> >     (especially section 3.5)
> >
> >     One other minor feedback is to try including the corresponding JIRA
> id
> >     with the commit message. The JIRA associated with this feature is:
> >     https://issues.apache.org/jira/browse/MADLIB-927
> >
> >
> > ---
> > If your project is set up for it, you can reply to this email and have
> your
> > reply appear on GitHub as well. If your project does not have this
> feature
> > enabled and wishes so, or if the feature is enabled but not working,
> please
> > contact infrastructure at infrastructure@apache.org or file a JIRA
> ticket
> > with INFRA.
> > ---
> >
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by Nandish Jayaram <nj...@pivotal.io>.
Thank you for the input Auon!

NJ

On Mon, Dec 19, 2016 at 7:23 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> Sure! I would be happy to contribute. Please let me know how can I help
> with the writeup.
>
> I mostly started off by reading Madlib research papers. Then, I visited
> its website to understand its architecture in detail and to gauge how much
> do I have to learn in order to contribute to it. I know Python but I had to
> go through some tutorials of postgres in order to understand the flow of
> code in any of the  Madlib's module. Doing the whole setup of Madlib was
> another thing. I made an install-readme for myself. Apart from all this, I
> had help from one of my colleague in college and the madlib's community.
>
>
>
>
>
> Regards,
>
> Auon
>
> ________________________________
> From: Nandish Jayaram <nj...@pivotal.io>
> Sent: Monday, December 19, 2016 12:51:18 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Great! Let us know if you have any other questions.
>
> NJ
>
> On Fri, Dec 16, 2016 at 10:52 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
>
> > Hi NJ,
> >
> > I guess I was able to play around with branching and other stuff but my
> PR
> > got deleted from madlib's repo. But that's okay as I have got the
> comments
> > you made, in e-mails. I will work on them from tomorrow.
> >
> >
> > Thanks for your help!
> >
> >
> > Thanks,
> >
> > Auon
> >
> > ________________________________
> > From: Kazmi,Auon H <ak...@ufl.edu>
> > Sent: Friday, December 16, 2016 11:09:11 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
> >
> > Hi NJ,
> >
> > Thanks for your detailed reply!
> >
> > I will try to do the said things.
> >
> >
> >
> > Thanks,
> >
> > Auon
> >
> > ________________________________
> > From: Nandish Jayaram <nj...@pivotal.io>
> > Sent: Friday, December 16, 2016 8:32:52 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
> >
> > Hi Auon,
> >
> > Hope your exams went well.
> >
> > You can do whatever ends up being a better git-learning experience for
> you.
> > Since you just started contributing to MADlib, the easier way to get
> going
> > might be to do what you mentioned. But a better, though a longer way,
> would
> > be to just mess around with branches as a learning experience. For
> instance
> > (be warned, this might not be the best approach and it might sound
> > daunting), you can do the following:
> > - Create a new local branch (say the branch name is temp-features/knn)
> > while on your current master branch (which already has the knn code
> changes
> > in it).
> > useful commands: git checkout -b temp-features/knn
> > - Go back to your master branch and reset it to the commit SHA before you
> > made changes for knn (look at git log command to find the appropriate
> > commit SHA).
> > useful commands: git log, git reset --hard <commit SHA> (be careful while
> > using the --hard flag in general).
> > - You essentially want to reach a state where the new branch features/knn
> > has the code changes you have made so far for the knn feature, and your
> > master branch must be in sync with apache/incubator-madlib's master
> branch.
> > You ideally want your local master to be in sync with your repo master,
> > which in turn must be in sync with origin master
> (apache/incubator-madlib).
> > - You might also want to push your master (with --force option) to your
> > remote repo, to undo the changes you have made to your repo master branch
> > with the previous PR.
> > useful commands: git push --force <your repo>
> > - Now create a new branch off your master (say branch name features/knn).
> > Rebase this new branch with the temp-features/knn branch. You will get
> the
> > knn related changes back on this branch now.
> > useful commands: git checkout -b features/knn, git rebase
> temp-features/knn
> > - Address the comments on this PR, and then push the features/knn branch
> to
> > your repo and open a new PR on the branch. Read about git rebase (and try
> > using it) before pushing the branch.
> > useful commands: (on master branch), git pull --ff-only, (on features/knn
> > branch) git rebase -i master
> >
> > The useful commands I have mentioned might not do the needful for each
> > step. They are just pointers for you. There might be a much more simpler
> > way to accomplish all this, and I have no idea if this way would actually
> > work correctly. :) But you can (almost) always recover from any mistake
> you
> > make on git.
> >
> > NJ
> >
> > On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
> >
> > > HI NJ,
> > >
> > > Thanks for your input!
> > >
> > > Sorry, I was busy with my semester-end exams.
> > >
> > > I am reading on Git. Should I repeat the process of checking out madlib
> > > repo and then again making changes in a local branch?
> > >
> > >
> > >
> > > Regards,
> > >
> > > Auon
> > >
> > > ________________________________
> > > From: njayaram2 <gi...@git.apache.org>
> > > Sent: Thursday, December 15, 2016 6:24:08 PM
> > > To: dev@madlib.incubator.apache.org
> > > Subject: [GitHub] incubator-madlib issue #80: KNN Added
> > >
> > > Github user njayaram2 commented on the issue:
> > >
> > >     https://github.com/apache/incubator-madlib/pull/80
> > >
> > >     This is a great start!
> > >     I will provide some github-specific feedback here, and more
> > > knn-specific
> > >     comments in the code.
> > >     Git can be daunting to use at first, but it's great once you get a
> > > hang of it.
> > >     I would recommend you go through the following wonderful book if
> you
> > >     have not already done so:
> > >     https://git-scm.com/book/en/v2
> > >
> > >     When you work on a feature/bug, it is best if you create a branch
> > > locally
> > >     and make all changes for that feature there. You can then push that
> > > branch
> > >     into your github repo and open a pull request. This way you won't
> > mess
> > > with
> > >     your local master branch, which should ideally be in sync with the
> > > origin's
> > >     (apache/incubator-madlib in this case) master branch. More
> > information
> > > on
> > >     how to work with branches can be found in the following chapter:
> > >     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-
> a-Nutshell
> > >     (especially section 3.5)
> > >
> > >     One other minor feedback is to try including the corresponding JIRA
> > id
> > >     with the commit message. The JIRA associated with this feature is:
> > >     https://issues.apache.org/jira/browse/MADLIB-927
> > >
> > >
> > > ---
> > > If your project is set up for it, you can reply to this email and have
> > your
> > > reply appear on GitHub as well. If your project does not have this
> > feature
> > > enabled and wishes so, or if the feature is enabled but not working,
> > please
> > > contact infrastructure at infrastructure@apache.org or file a JIRA
> > ticket
> > > with INFRA.
> > > ---
> > >
> >
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by "Kazmi,Auon H" <ak...@ufl.edu>.
Sure! I would be happy to contribute. Please let me know how can I help with the writeup.

I mostly started off by reading Madlib research papers. Then, I visited its website to understand its architecture in detail and to gauge how much do I have to learn in order to contribute to it. I know Python but I had to go through some tutorials of postgres in order to understand the flow of code in any of the  Madlib's module. Doing the whole setup of Madlib was another thing. I made an install-readme for myself. Apart from all this, I had help from one of my colleague in college and the madlib's community.





Regards,

Auon

________________________________
From: Nandish Jayaram <nj...@pivotal.io>
Sent: Monday, December 19, 2016 12:51:18 PM
To: dev@madlib.incubator.apache.org
Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added

Great! Let us know if you have any other questions.

NJ

On Fri, Dec 16, 2016 at 10:52 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> Hi NJ,
>
> I guess I was able to play around with branching and other stuff but my PR
> got deleted from madlib's repo. But that's okay as I have got the comments
> you made, in e-mails. I will work on them from tomorrow.
>
>
> Thanks for your help!
>
>
> Thanks,
>
> Auon
>
> ________________________________
> From: Kazmi,Auon H <ak...@ufl.edu>
> Sent: Friday, December 16, 2016 11:09:11 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi NJ,
>
> Thanks for your detailed reply!
>
> I will try to do the said things.
>
>
>
> Thanks,
>
> Auon
>
> ________________________________
> From: Nandish Jayaram <nj...@pivotal.io>
> Sent: Friday, December 16, 2016 8:32:52 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi Auon,
>
> Hope your exams went well.
>
> You can do whatever ends up being a better git-learning experience for you.
> Since you just started contributing to MADlib, the easier way to get going
> might be to do what you mentioned. But a better, though a longer way, would
> be to just mess around with branches as a learning experience. For instance
> (be warned, this might not be the best approach and it might sound
> daunting), you can do the following:
> - Create a new local branch (say the branch name is temp-features/knn)
> while on your current master branch (which already has the knn code changes
> in it).
> useful commands: git checkout -b temp-features/knn
> - Go back to your master branch and reset it to the commit SHA before you
> made changes for knn (look at git log command to find the appropriate
> commit SHA).
> useful commands: git log, git reset --hard <commit SHA> (be careful while
> using the --hard flag in general).
> - You essentially want to reach a state where the new branch features/knn
> has the code changes you have made so far for the knn feature, and your
> master branch must be in sync with apache/incubator-madlib's master branch.
> You ideally want your local master to be in sync with your repo master,
> which in turn must be in sync with origin master (apache/incubator-madlib).
> - You might also want to push your master (with --force option) to your
> remote repo, to undo the changes you have made to your repo master branch
> with the previous PR.
> useful commands: git push --force <your repo>
> - Now create a new branch off your master (say branch name features/knn).
> Rebase this new branch with the temp-features/knn branch. You will get the
> knn related changes back on this branch now.
> useful commands: git checkout -b features/knn, git rebase temp-features/knn
> - Address the comments on this PR, and then push the features/knn branch to
> your repo and open a new PR on the branch. Read about git rebase (and try
> using it) before pushing the branch.
> useful commands: (on master branch), git pull --ff-only, (on features/knn
> branch) git rebase -i master
>
> The useful commands I have mentioned might not do the needful for each
> step. They are just pointers for you. There might be a much more simpler
> way to accomplish all this, and I have no idea if this way would actually
> work correctly. :) But you can (almost) always recover from any mistake you
> make on git.
>
> NJ
>
> On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
>
> > HI NJ,
> >
> > Thanks for your input!
> >
> > Sorry, I was busy with my semester-end exams.
> >
> > I am reading on Git. Should I repeat the process of checking out madlib
> > repo and then again making changes in a local branch?
> >
> >
> >
> > Regards,
> >
> > Auon
> >
> > ________________________________
> > From: njayaram2 <gi...@git.apache.org>
> > Sent: Thursday, December 15, 2016 6:24:08 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: [GitHub] incubator-madlib issue #80: KNN Added
> >
> > Github user njayaram2 commented on the issue:
> >
> >     https://github.com/apache/incubator-madlib/pull/80
> >
> >     This is a great start!
> >     I will provide some github-specific feedback here, and more
> > knn-specific
> >     comments in the code.
> >     Git can be daunting to use at first, but it's great once you get a
> > hang of it.
> >     I would recommend you go through the following wonderful book if you
> >     have not already done so:
> >     https://git-scm.com/book/en/v2
> >
> >     When you work on a feature/bug, it is best if you create a branch
> > locally
> >     and make all changes for that feature there. You can then push that
> > branch
> >     into your github repo and open a pull request. This way you won't
> mess
> > with
> >     your local master branch, which should ideally be in sync with the
> > origin's
> >     (apache/incubator-madlib in this case) master branch. More
> information
> > on
> >     how to work with branches can be found in the following chapter:
> >     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
> >     (especially section 3.5)
> >
> >     One other minor feedback is to try including the corresponding JIRA
> id
> >     with the commit message. The JIRA associated with this feature is:
> >     https://issues.apache.org/jira/browse/MADLIB-927
> >
> >
> > ---
> > If your project is set up for it, you can reply to this email and have
> your
> > reply appear on GitHub as well. If your project does not have this
> feature
> > enabled and wishes so, or if the feature is enabled but not working,
> please
> > contact infrastructure at infrastructure@apache.org or file a JIRA
> ticket
> > with INFRA.
> > ---
> >
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by Nandish Jayaram <nj...@pivotal.io>.
Thank you Chenliang, I think it's a great idea to have such a writeup. It
will be great if the community can suggest some topics to include in the
writeup.
Auon, you might be able to suggest some good ideas for the topics since you
just went through the process and you would know where you struggled the
most!

NJ

On Mon, Dec 19, 2016 at 9:51 AM, Nandish Jayaram <nj...@pivotal.io>
wrote:

> Great! Let us know if you have any other questions.
>
> NJ
>
> On Fri, Dec 16, 2016 at 10:52 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
>
>> Hi NJ,
>>
>> I guess I was able to play around with branching and other stuff but my
>> PR got deleted from madlib's repo. But that's okay as I have got the
>> comments you made, in e-mails. I will work on them from tomorrow.
>>
>>
>> Thanks for your help!
>>
>>
>> Thanks,
>>
>> Auon
>>
>> ________________________________
>> From: Kazmi,Auon H <ak...@ufl.edu>
>> Sent: Friday, December 16, 2016 11:09:11 PM
>> To: dev@madlib.incubator.apache.org
>> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>>
>> Hi NJ,
>>
>> Thanks for your detailed reply!
>>
>> I will try to do the said things.
>>
>>
>>
>> Thanks,
>>
>> Auon
>>
>> ________________________________
>> From: Nandish Jayaram <nj...@pivotal.io>
>> Sent: Friday, December 16, 2016 8:32:52 PM
>> To: dev@madlib.incubator.apache.org
>> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>>
>> Hi Auon,
>>
>> Hope your exams went well.
>>
>> You can do whatever ends up being a better git-learning experience for
>> you.
>> Since you just started contributing to MADlib, the easier way to get going
>> might be to do what you mentioned. But a better, though a longer way,
>> would
>> be to just mess around with branches as a learning experience. For
>> instance
>> (be warned, this might not be the best approach and it might sound
>> daunting), you can do the following:
>> - Create a new local branch (say the branch name is temp-features/knn)
>> while on your current master branch (which already has the knn code
>> changes
>> in it).
>> useful commands: git checkout -b temp-features/knn
>> - Go back to your master branch and reset it to the commit SHA before you
>> made changes for knn (look at git log command to find the appropriate
>> commit SHA).
>> useful commands: git log, git reset --hard <commit SHA> (be careful while
>> using the --hard flag in general).
>> - You essentially want to reach a state where the new branch features/knn
>> has the code changes you have made so far for the knn feature, and your
>> master branch must be in sync with apache/incubator-madlib's master
>> branch.
>> You ideally want your local master to be in sync with your repo master,
>> which in turn must be in sync with origin master
>> (apache/incubator-madlib).
>> - You might also want to push your master (with --force option) to your
>> remote repo, to undo the changes you have made to your repo master branch
>> with the previous PR.
>> useful commands: git push --force <your repo>
>> - Now create a new branch off your master (say branch name features/knn).
>> Rebase this new branch with the temp-features/knn branch. You will get the
>> knn related changes back on this branch now.
>> useful commands: git checkout -b features/knn, git rebase
>> temp-features/knn
>> - Address the comments on this PR, and then push the features/knn branch
>> to
>> your repo and open a new PR on the branch. Read about git rebase (and try
>> using it) before pushing the branch.
>> useful commands: (on master branch), git pull --ff-only, (on features/knn
>> branch) git rebase -i master
>>
>> The useful commands I have mentioned might not do the needful for each
>> step. They are just pointers for you. There might be a much more simpler
>> way to accomplish all this, and I have no idea if this way would actually
>> work correctly. :) But you can (almost) always recover from any mistake
>> you
>> make on git.
>>
>> NJ
>>
>> On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
>>
>> > HI NJ,
>> >
>> > Thanks for your input!
>> >
>> > Sorry, I was busy with my semester-end exams.
>> >
>> > I am reading on Git. Should I repeat the process of checking out madlib
>> > repo and then again making changes in a local branch?
>> >
>> >
>> >
>> > Regards,
>> >
>> > Auon
>> >
>> > ________________________________
>> > From: njayaram2 <gi...@git.apache.org>
>> > Sent: Thursday, December 15, 2016 6:24:08 PM
>> > To: dev@madlib.incubator.apache.org
>> > Subject: [GitHub] incubator-madlib issue #80: KNN Added
>> >
>> > Github user njayaram2 commented on the issue:
>> >
>> >     https://github.com/apache/incubator-madlib/pull/80
>> >
>> >     This is a great start!
>> >     I will provide some github-specific feedback here, and more
>> > knn-specific
>> >     comments in the code.
>> >     Git can be daunting to use at first, but it's great once you get a
>> > hang of it.
>> >     I would recommend you go through the following wonderful book if you
>> >     have not already done so:
>> >     https://git-scm.com/book/en/v2
>> >
>> >     When you work on a feature/bug, it is best if you create a branch
>> > locally
>> >     and make all changes for that feature there. You can then push that
>> > branch
>> >     into your github repo and open a pull request. This way you won't
>> mess
>> > with
>> >     your local master branch, which should ideally be in sync with the
>> > origin's
>> >     (apache/incubator-madlib in this case) master branch. More
>> information
>> > on
>> >     how to work with branches can be found in the following chapter:
>> >     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
>> >     (especially section 3.5)
>> >
>> >     One other minor feedback is to try including the corresponding JIRA
>> id
>> >     with the commit message. The JIRA associated with this feature is:
>> >     https://issues.apache.org/jira/browse/MADLIB-927
>> >
>> >
>> > ---
>> > If your project is set up for it, you can reply to this email and have
>> your
>> > reply appear on GitHub as well. If your project does not have this
>> feature
>> > enabled and wishes so, or if the feature is enabled but not working,
>> please
>> > contact infrastructure at infrastructure@apache.org or file a JIRA
>> ticket
>> > with INFRA.
>> > ---
>> >
>>
>
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by Nandish Jayaram <nj...@pivotal.io>.
Great! Let us know if you have any other questions.

NJ

On Fri, Dec 16, 2016 at 10:52 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> Hi NJ,
>
> I guess I was able to play around with branching and other stuff but my PR
> got deleted from madlib's repo. But that's okay as I have got the comments
> you made, in e-mails. I will work on them from tomorrow.
>
>
> Thanks for your help!
>
>
> Thanks,
>
> Auon
>
> ________________________________
> From: Kazmi,Auon H <ak...@ufl.edu>
> Sent: Friday, December 16, 2016 11:09:11 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi NJ,
>
> Thanks for your detailed reply!
>
> I will try to do the said things.
>
>
>
> Thanks,
>
> Auon
>
> ________________________________
> From: Nandish Jayaram <nj...@pivotal.io>
> Sent: Friday, December 16, 2016 8:32:52 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi Auon,
>
> Hope your exams went well.
>
> You can do whatever ends up being a better git-learning experience for you.
> Since you just started contributing to MADlib, the easier way to get going
> might be to do what you mentioned. But a better, though a longer way, would
> be to just mess around with branches as a learning experience. For instance
> (be warned, this might not be the best approach and it might sound
> daunting), you can do the following:
> - Create a new local branch (say the branch name is temp-features/knn)
> while on your current master branch (which already has the knn code changes
> in it).
> useful commands: git checkout -b temp-features/knn
> - Go back to your master branch and reset it to the commit SHA before you
> made changes for knn (look at git log command to find the appropriate
> commit SHA).
> useful commands: git log, git reset --hard <commit SHA> (be careful while
> using the --hard flag in general).
> - You essentially want to reach a state where the new branch features/knn
> has the code changes you have made so far for the knn feature, and your
> master branch must be in sync with apache/incubator-madlib's master branch.
> You ideally want your local master to be in sync with your repo master,
> which in turn must be in sync with origin master (apache/incubator-madlib).
> - You might also want to push your master (with --force option) to your
> remote repo, to undo the changes you have made to your repo master branch
> with the previous PR.
> useful commands: git push --force <your repo>
> - Now create a new branch off your master (say branch name features/knn).
> Rebase this new branch with the temp-features/knn branch. You will get the
> knn related changes back on this branch now.
> useful commands: git checkout -b features/knn, git rebase temp-features/knn
> - Address the comments on this PR, and then push the features/knn branch to
> your repo and open a new PR on the branch. Read about git rebase (and try
> using it) before pushing the branch.
> useful commands: (on master branch), git pull --ff-only, (on features/knn
> branch) git rebase -i master
>
> The useful commands I have mentioned might not do the needful for each
> step. They are just pointers for you. There might be a much more simpler
> way to accomplish all this, and I have no idea if this way would actually
> work correctly. :) But you can (almost) always recover from any mistake you
> make on git.
>
> NJ
>
> On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:
>
> > HI NJ,
> >
> > Thanks for your input!
> >
> > Sorry, I was busy with my semester-end exams.
> >
> > I am reading on Git. Should I repeat the process of checking out madlib
> > repo and then again making changes in a local branch?
> >
> >
> >
> > Regards,
> >
> > Auon
> >
> > ________________________________
> > From: njayaram2 <gi...@git.apache.org>
> > Sent: Thursday, December 15, 2016 6:24:08 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: [GitHub] incubator-madlib issue #80: KNN Added
> >
> > Github user njayaram2 commented on the issue:
> >
> >     https://github.com/apache/incubator-madlib/pull/80
> >
> >     This is a great start!
> >     I will provide some github-specific feedback here, and more
> > knn-specific
> >     comments in the code.
> >     Git can be daunting to use at first, but it's great once you get a
> > hang of it.
> >     I would recommend you go through the following wonderful book if you
> >     have not already done so:
> >     https://git-scm.com/book/en/v2
> >
> >     When you work on a feature/bug, it is best if you create a branch
> > locally
> >     and make all changes for that feature there. You can then push that
> > branch
> >     into your github repo and open a pull request. This way you won't
> mess
> > with
> >     your local master branch, which should ideally be in sync with the
> > origin's
> >     (apache/incubator-madlib in this case) master branch. More
> information
> > on
> >     how to work with branches can be found in the following chapter:
> >     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
> >     (especially section 3.5)
> >
> >     One other minor feedback is to try including the corresponding JIRA
> id
> >     with the commit message. The JIRA associated with this feature is:
> >     https://issues.apache.org/jira/browse/MADLIB-927
> >
> >
> > ---
> > If your project is set up for it, you can reply to this email and have
> your
> > reply appear on GitHub as well. If your project does not have this
> feature
> > enabled and wishes so, or if the feature is enabled but not working,
> please
> > contact infrastructure at infrastructure@apache.org or file a JIRA
> ticket
> > with INFRA.
> > ---
> >
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by "Kazmi,Auon H" <ak...@ufl.edu>.
Hi NJ,

I guess I was able to play around with branching and other stuff but my PR got deleted from madlib's repo. But that's okay as I have got the comments you made, in e-mails. I will work on them from tomorrow.


Thanks for your help!


Thanks,

Auon

________________________________
From: Kazmi,Auon H <ak...@ufl.edu>
Sent: Friday, December 16, 2016 11:09:11 PM
To: dev@madlib.incubator.apache.org
Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added

Hi NJ,

Thanks for your detailed reply!

I will try to do the said things.



Thanks,

Auon

________________________________
From: Nandish Jayaram <nj...@pivotal.io>
Sent: Friday, December 16, 2016 8:32:52 PM
To: dev@madlib.incubator.apache.org
Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added

Hi Auon,

Hope your exams went well.

You can do whatever ends up being a better git-learning experience for you.
Since you just started contributing to MADlib, the easier way to get going
might be to do what you mentioned. But a better, though a longer way, would
be to just mess around with branches as a learning experience. For instance
(be warned, this might not be the best approach and it might sound
daunting), you can do the following:
- Create a new local branch (say the branch name is temp-features/knn)
while on your current master branch (which already has the knn code changes
in it).
useful commands: git checkout -b temp-features/knn
- Go back to your master branch and reset it to the commit SHA before you
made changes for knn (look at git log command to find the appropriate
commit SHA).
useful commands: git log, git reset --hard <commit SHA> (be careful while
using the --hard flag in general).
- You essentially want to reach a state where the new branch features/knn
has the code changes you have made so far for the knn feature, and your
master branch must be in sync with apache/incubator-madlib's master branch.
You ideally want your local master to be in sync with your repo master,
which in turn must be in sync with origin master (apache/incubator-madlib).
- You might also want to push your master (with --force option) to your
remote repo, to undo the changes you have made to your repo master branch
with the previous PR.
useful commands: git push --force <your repo>
- Now create a new branch off your master (say branch name features/knn).
Rebase this new branch with the temp-features/knn branch. You will get the
knn related changes back on this branch now.
useful commands: git checkout -b features/knn, git rebase temp-features/knn
- Address the comments on this PR, and then push the features/knn branch to
your repo and open a new PR on the branch. Read about git rebase (and try
using it) before pushing the branch.
useful commands: (on master branch), git pull --ff-only, (on features/knn
branch) git rebase -i master

The useful commands I have mentioned might not do the needful for each
step. They are just pointers for you. There might be a much more simpler
way to accomplish all this, and I have no idea if this way would actually
work correctly. :) But you can (almost) always recover from any mistake you
make on git.

NJ

On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> HI NJ,
>
> Thanks for your input!
>
> Sorry, I was busy with my semester-end exams.
>
> I am reading on Git. Should I repeat the process of checking out madlib
> repo and then again making changes in a local branch?
>
>
>
> Regards,
>
> Auon
>
> ________________________________
> From: njayaram2 <gi...@git.apache.org>
> Sent: Thursday, December 15, 2016 6:24:08 PM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib issue #80: KNN Added
>
> Github user njayaram2 commented on the issue:
>
>     https://github.com/apache/incubator-madlib/pull/80
>
>     This is a great start!
>     I will provide some github-specific feedback here, and more
> knn-specific
>     comments in the code.
>     Git can be daunting to use at first, but it's great once you get a
> hang of it.
>     I would recommend you go through the following wonderful book if you
>     have not already done so:
>     https://git-scm.com/book/en/v2
>
>     When you work on a feature/bug, it is best if you create a branch
> locally
>     and make all changes for that feature there. You can then push that
> branch
>     into your github repo and open a pull request. This way you won't mess
> with
>     your local master branch, which should ideally be in sync with the
> origin's
>     (apache/incubator-madlib in this case) master branch. More information
> on
>     how to work with branches can be found in the following chapter:
>     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
>     (especially section 3.5)
>
>     One other minor feedback is to try including the corresponding JIRA id
>     with the commit message. The JIRA associated with this feature is:
>     https://issues.apache.org/jira/browse/MADLIB-927
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by "Kazmi,Auon H" <ak...@ufl.edu>.
Hi NJ,

Thanks for your detailed reply!

I will try to do the said things.



Thanks,

Auon

________________________________
From: Nandish Jayaram <nj...@pivotal.io>
Sent: Friday, December 16, 2016 8:32:52 PM
To: dev@madlib.incubator.apache.org
Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added

Hi Auon,

Hope your exams went well.

You can do whatever ends up being a better git-learning experience for you.
Since you just started contributing to MADlib, the easier way to get going
might be to do what you mentioned. But a better, though a longer way, would
be to just mess around with branches as a learning experience. For instance
(be warned, this might not be the best approach and it might sound
daunting), you can do the following:
- Create a new local branch (say the branch name is temp-features/knn)
while on your current master branch (which already has the knn code changes
in it).
useful commands: git checkout -b temp-features/knn
- Go back to your master branch and reset it to the commit SHA before you
made changes for knn (look at git log command to find the appropriate
commit SHA).
useful commands: git log, git reset --hard <commit SHA> (be careful while
using the --hard flag in general).
- You essentially want to reach a state where the new branch features/knn
has the code changes you have made so far for the knn feature, and your
master branch must be in sync with apache/incubator-madlib's master branch.
You ideally want your local master to be in sync with your repo master,
which in turn must be in sync with origin master (apache/incubator-madlib).
- You might also want to push your master (with --force option) to your
remote repo, to undo the changes you have made to your repo master branch
with the previous PR.
useful commands: git push --force <your repo>
- Now create a new branch off your master (say branch name features/knn).
Rebase this new branch with the temp-features/knn branch. You will get the
knn related changes back on this branch now.
useful commands: git checkout -b features/knn, git rebase temp-features/knn
- Address the comments on this PR, and then push the features/knn branch to
your repo and open a new PR on the branch. Read about git rebase (and try
using it) before pushing the branch.
useful commands: (on master branch), git pull --ff-only, (on features/knn
branch) git rebase -i master

The useful commands I have mentioned might not do the needful for each
step. They are just pointers for you. There might be a much more simpler
way to accomplish all this, and I have no idea if this way would actually
work correctly. :) But you can (almost) always recover from any mistake you
make on git.

NJ

On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> HI NJ,
>
> Thanks for your input!
>
> Sorry, I was busy with my semester-end exams.
>
> I am reading on Git. Should I repeat the process of checking out madlib
> repo and then again making changes in a local branch?
>
>
>
> Regards,
>
> Auon
>
> ________________________________
> From: njayaram2 <gi...@git.apache.org>
> Sent: Thursday, December 15, 2016 6:24:08 PM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib issue #80: KNN Added
>
> Github user njayaram2 commented on the issue:
>
>     https://github.com/apache/incubator-madlib/pull/80
>
>     This is a great start!
>     I will provide some github-specific feedback here, and more
> knn-specific
>     comments in the code.
>     Git can be daunting to use at first, but it's great once you get a
> hang of it.
>     I would recommend you go through the following wonderful book if you
>     have not already done so:
>     https://git-scm.com/book/en/v2
>
>     When you work on a feature/bug, it is best if you create a branch
> locally
>     and make all changes for that feature there. You can then push that
> branch
>     into your github repo and open a pull request. This way you won't mess
> with
>     your local master branch, which should ideally be in sync with the
> origin's
>     (apache/incubator-madlib in this case) master branch. More information
> on
>     how to work with branches can be found in the following chapter:
>     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
>     (especially section 3.5)
>
>     One other minor feedback is to try including the corresponding JIRA id
>     with the commit message. The JIRA associated with this feature is:
>     https://issues.apache.org/jira/browse/MADLIB-927
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by Nandish Jayaram <nj...@pivotal.io>.
Hi Auon,

Hope your exams went well.

You can do whatever ends up being a better git-learning experience for you.
Since you just started contributing to MADlib, the easier way to get going
might be to do what you mentioned. But a better, though a longer way, would
be to just mess around with branches as a learning experience. For instance
(be warned, this might not be the best approach and it might sound
daunting), you can do the following:
- Create a new local branch (say the branch name is temp-features/knn)
while on your current master branch (which already has the knn code changes
in it).
useful commands: git checkout -b temp-features/knn
- Go back to your master branch and reset it to the commit SHA before you
made changes for knn (look at git log command to find the appropriate
commit SHA).
useful commands: git log, git reset --hard <commit SHA> (be careful while
using the --hard flag in general).
- You essentially want to reach a state where the new branch features/knn
has the code changes you have made so far for the knn feature, and your
master branch must be in sync with apache/incubator-madlib's master branch.
You ideally want your local master to be in sync with your repo master,
which in turn must be in sync with origin master (apache/incubator-madlib).
- You might also want to push your master (with --force option) to your
remote repo, to undo the changes you have made to your repo master branch
with the previous PR.
useful commands: git push --force <your repo>
- Now create a new branch off your master (say branch name features/knn).
Rebase this new branch with the temp-features/knn branch. You will get the
knn related changes back on this branch now.
useful commands: git checkout -b features/knn, git rebase temp-features/knn
- Address the comments on this PR, and then push the features/knn branch to
your repo and open a new PR on the branch. Read about git rebase (and try
using it) before pushing the branch.
useful commands: (on master branch), git pull --ff-only, (on features/knn
branch) git rebase -i master

The useful commands I have mentioned might not do the needful for each
step. They are just pointers for you. There might be a much more simpler
way to accomplish all this, and I have no idea if this way would actually
work correctly. :) But you can (almost) always recover from any mistake you
make on git.

NJ

On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <ak...@ufl.edu> wrote:

> HI NJ,
>
> Thanks for your input!
>
> Sorry, I was busy with my semester-end exams.
>
> I am reading on Git. Should I repeat the process of checking out madlib
> repo and then again making changes in a local branch?
>
>
>
> Regards,
>
> Auon
>
> ________________________________
> From: njayaram2 <gi...@git.apache.org>
> Sent: Thursday, December 15, 2016 6:24:08 PM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib issue #80: KNN Added
>
> Github user njayaram2 commented on the issue:
>
>     https://github.com/apache/incubator-madlib/pull/80
>
>     This is a great start!
>     I will provide some github-specific feedback here, and more
> knn-specific
>     comments in the code.
>     Git can be daunting to use at first, but it's great once you get a
> hang of it.
>     I would recommend you go through the following wonderful book if you
>     have not already done so:
>     https://git-scm.com/book/en/v2
>
>     When you work on a feature/bug, it is best if you create a branch
> locally
>     and make all changes for that feature there. You can then push that
> branch
>     into your github repo and open a pull request. This way you won't mess
> with
>     your local master branch, which should ideally be in sync with the
> origin's
>     (apache/incubator-madlib in this case) master branch. More information
> on
>     how to work with branches can be found in the following chapter:
>     https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
>     (especially section 3.5)
>
>     One other minor feedback is to try including the corresponding JIRA id
>     with the commit message. The JIRA associated with this feature is:
>     https://issues.apache.org/jira/browse/MADLIB-927
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

Posted by "Kazmi,Auon H" <ak...@ufl.edu>.
HI NJ,

Thanks for your input!

Sorry, I was busy with my semester-end exams.

I am reading on Git. Should I repeat the process of checking out madlib repo and then again making changes in a local branch?



Regards,

Auon

________________________________
From: njayaram2 <gi...@git.apache.org>
Sent: Thursday, December 15, 2016 6:24:08 PM
To: dev@madlib.incubator.apache.org
Subject: [GitHub] incubator-madlib issue #80: KNN Added

Github user njayaram2 commented on the issue:

    https://github.com/apache/incubator-madlib/pull/80

    This is a great start!
    I will provide some github-specific feedback here, and more knn-specific
    comments in the code.
    Git can be daunting to use at first, but it's great once you get a hang of it.
    I would recommend you go through the following wonderful book if you
    have not already done so:
    https://git-scm.com/book/en/v2

    When you work on a feature/bug, it is best if you create a branch locally
    and make all changes for that feature there. You can then push that branch
    into your github repo and open a pull request. This way you won't mess with
    your local master branch, which should ideally be in sync with the origin's
    (apache/incubator-madlib in this case) master branch. More information on
    how to work with branches can be found in the following chapter:
    https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
    (especially section 3.5)

    One other minor feedback is to try including the corresponding JIRA id
    with the commit message. The JIRA associated with this feature is:
    https://issues.apache.org/jira/browse/MADLIB-927


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on the issue:

    https://github.com/apache/incubator-madlib/pull/80
  
    This is a great start! 
    I will provide some github-specific feedback here, and more knn-specific
    comments in the code.
    Git can be daunting to use at first, but it's great once you get a hang of it.
    I would recommend you go through the following wonderful book if you
    have not already done so:
    https://git-scm.com/book/en/v2
    
    When you work on a feature/bug, it is best if you create a branch locally
    and make all changes for that feature there. You can then push that branch
    into your github repo and open a pull request. This way you won't mess with
    your local master branch, which should ideally be in sync with the origin's
    (apache/incubator-madlib in this case) master branch. More information on
    how to work with branches can be found in the following chapter:
    https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell 
    (especially section 3.5)
    
    One other minor feedback is to try including the corresponding JIRA id 
    with the commit message. The JIRA associated with this feature is:
    https://issues.apache.org/jira/browse/MADLIB-927


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/80#discussion_r92720659
  
    --- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
    @@ -0,0 +1,126 @@
    +/* ----------------------------------------------------------------------- *//**
    + *
    + * @file knn.sql_in
    + *
    + * @brief Set of functions for k-nearest neighbors.
    + *
    + *
    + *//* ----------------------------------------------------------------------- */
    +
    +m4_include(`SQLCommon.m4')
    +
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.knn_result CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.knn_result AS (
    +    prediction float
    +);
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.test_table_spec CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.test_table_spec AS (
    +    id integer,
    +    vector DOUBLE PRECISION[]
    +);
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__knn_validate_src(
    +rel_source VARCHAR
    +) RETURNS VOID AS $$
    +    PythonFunction(knn, knn, knn_validate_src)
    +$$ LANGUAGE plpythonu
    +m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
    +
    +
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
    +    point_source VARCHAR,
    +    point_column_name VARCHAR,
    +    label_column_name VARCHAR,
    +    test_source VARCHAR,
    +    test_column_name VARCHAR,
    +    id_column_name VARCHAR,
    +    operation VARCHAR,
    +    k INTEGER
    +) RETURNS VARCHAR AS $$
    +DECLARE
    +    class_test_source REGCLASS;
    +    class_point_source REGCLASS;
    +    l FLOAT;
    +    id INTEGER;
    +    vector DOUBLE PRECISION[];
    +    cur_pid integer;
    +    theResult MADLIB_SCHEMA.knn_result;
    +    r MADLIB_SCHEMA.test_table_spec;
    +    oldClientMinMessages VARCHAR;
    +    returnstring VARCHAR;
    +BEGIN
    +    oldClientMinMessages :=
    +        (SELECT setting FROM pg_settings WHERE name = 'client_min_messages');
    +    EXECUTE 'SET client_min_messages TO warning';
    +    PERFORM MADLIB_SCHEMA.__knn_validate_src(test_source);
    +    PERFORM MADLIB_SCHEMA.__knn_validate_src(point_source);
    +    class_test_source := test_source;
    +    class_point_source := point_source;
    +    --checks
    +    IF (k <= 0) THEN
    +        RAISE EXCEPTION 'KNN error: Number of neighbors k must be a positive integer.';
    +    END IF;
    +    IF (operation != 'c' AND operation != 'r') THEN
    +        RAISE EXCEPTION 'KNN error: put r for regression OR c for classification.';
    +    END IF;
    +    PERFORM MADLIB_SCHEMA.create_schema_pg_temp();
    +    EXECUTE
    +        $sql$
    +    	DROP TABLE IF EXISTS pg_temp.knn_label;
    --- End diff --
    
    I think this can be a temp table you create within the
    madlib schema. Drop it before exiting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/80#discussion_r92722081
  
    --- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
    @@ -0,0 +1,126 @@
    +/* ----------------------------------------------------------------------- *//**
    + *
    + * @file knn.sql_in
    + *
    + * @brief Set of functions for k-nearest neighbors.
    + *
    + *
    + *//* ----------------------------------------------------------------------- */
    +
    +m4_include(`SQLCommon.m4')
    +
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.knn_result CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.knn_result AS (
    +    prediction float
    +);
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.test_table_spec CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.test_table_spec AS (
    +    id integer,
    +    vector DOUBLE PRECISION[]
    +);
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__knn_validate_src(
    +rel_source VARCHAR
    +) RETURNS VOID AS $$
    +    PythonFunction(knn, knn, knn_validate_src)
    +$$ LANGUAGE plpythonu
    +m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
    +
    +
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
    +    point_source VARCHAR,
    +    point_column_name VARCHAR,
    +    label_column_name VARCHAR,
    +    test_source VARCHAR,
    +    test_column_name VARCHAR,
    +    id_column_name VARCHAR,
    +    operation VARCHAR,
    +    k INTEGER
    +) RETURNS VARCHAR AS $$
    +DECLARE
    --- End diff --
    
    We can also overload this function. At the moment it looks
    like only `k` can be an optional parameter. So we should be
    able to run knn with some default value of `k`, if not provided.
    
    Showing help messages when we call `knn()` or `knn('help')`
    are examples of overloading this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on the issue:

    https://github.com/apache/incubator-madlib/pull/80
  
    Overall, this looks great and the code seems to be working correctly.
    We can certainly make various modifications to it. For instance:
    - Try to use other distance measures. If we can indeed use other distance
    measures, we should have that as an optional parameter.
    - We can try to improve the performance using parallel processing,
    say on a distributed database like Greenplum. We can use
    UDAs (with C++ code: Chapter 1 in http://madlib.incubator.apache.org/design.pdf
    I had mentioned earlier) that can help in finding the k-nearest neighbors in parallel.
    I guess that is something we can look at once you work on the comments
    made on this version. We can discuss that when we get there!
    - Incorporate other changes the community might suggest. There are
    several postgres experts in the community who might be able to provide
    suggestions to make the existing code more performant!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/80#discussion_r92721716
  
    --- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
    @@ -0,0 +1,126 @@
    +/* ----------------------------------------------------------------------- *//**
    + *
    + * @file knn.sql_in
    + *
    + * @brief Set of functions for k-nearest neighbors.
    + *
    + *
    + *//* ----------------------------------------------------------------------- */
    +
    +m4_include(`SQLCommon.m4')
    +
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.knn_result CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.knn_result AS (
    +    prediction float
    +);
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.test_table_spec CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.test_table_spec AS (
    +    id integer,
    +    vector DOUBLE PRECISION[]
    +);
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__knn_validate_src(
    +rel_source VARCHAR
    +) RETURNS VOID AS $$
    +    PythonFunction(knn, knn, knn_validate_src)
    +$$ LANGUAGE plpythonu
    +m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
    +
    +
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
    +    point_source VARCHAR,
    +    point_column_name VARCHAR,
    +    label_column_name VARCHAR,
    +    test_source VARCHAR,
    +    test_column_name VARCHAR,
    +    id_column_name VARCHAR,
    +    operation VARCHAR,
    +    k INTEGER
    +) RETURNS VARCHAR AS $$
    +DECLARE
    +    class_test_source REGCLASS;
    +    class_point_source REGCLASS;
    +    l FLOAT;
    +    id INTEGER;
    +    vector DOUBLE PRECISION[];
    +    cur_pid integer;
    +    theResult MADLIB_SCHEMA.knn_result;
    +    r MADLIB_SCHEMA.test_table_spec;
    +    oldClientMinMessages VARCHAR;
    +    returnstring VARCHAR;
    +BEGIN
    +    oldClientMinMessages :=
    +        (SELECT setting FROM pg_settings WHERE name = 'client_min_messages');
    +    EXECUTE 'SET client_min_messages TO warning';
    +    PERFORM MADLIB_SCHEMA.__knn_validate_src(test_source);
    +    PERFORM MADLIB_SCHEMA.__knn_validate_src(point_source);
    +    class_test_source := test_source;
    +    class_point_source := point_source;
    +    --checks
    +    IF (k <= 0) THEN
    +        RAISE EXCEPTION 'KNN error: Number of neighbors k must be a positive integer.';
    +    END IF;
    +    IF (operation != 'c' AND operation != 'r') THEN
    +        RAISE EXCEPTION 'KNN error: put r for regression OR c for classification.';
    +    END IF;
    +    PERFORM MADLIB_SCHEMA.create_schema_pg_temp();
    +    EXECUTE
    +        $sql$
    +    	DROP TABLE IF EXISTS pg_temp.knn_label;
    +    	CREATE TABLE pg_temp.knn_label(pid integer, predlabel float);
    +	$sql$;
    +    
    +    --FOR r IN EXECUTE format('SELECT * FROM %I', test_source)
    +    FOR r IN EXECUTE format('SELECT %I,%I FROM %I', id_column_name, test_column_name, test_source)
    +    LOOP
    +        
    +	--RAISE NOTICE 'Original: %',r.pid;
    +	--RAISE NOTICE 'Original: %',r.p;
    +	cur_pid := r.id;
    +	vector := r.vector;
    +	EXECUTE
    +        $sql$
    +	DROP TABLE IF EXISTS pg_temp.knn_vector;
    +        CREATE TABLE pg_temp.knn_vector(vec DOUBLE PRECISION[]);
    +	$sql$;
    +	EXECUTE 'INSERT INTO pg_temp.knn_vector values($1)'
    +	USING vector;
    +	EXECUTE
    +        $sql$
    +	DROP TABLE IF EXISTS pg_temp.knn_interm;
    +        --CREATE TABLE pg_temp.knn_interm(dist DOUBLE PRECISION, lable integer );
    +	CREATE TABLE pg_temp.knn_interm AS
    +	
    +        SELECT madlib.squared_dist_norm2($sql$ || point_column_name || $sql$, vec) as dist, $sql$ || label_column_name || $sql$ FROM $sql$ || textin(regclassout(point_source)) || $sql$, knn_vector order by dist limit $sql$ || k;
    +	IF (operation = 'c') THEN
    +    	EXECUTE
    +        $sql$
    +        SELECT mode() within group (order by $sql$ || label_column_name || $sql$) FROM  pg_temp.knn_interm $sql$
    +    	INTO l;
    +        ELSE
    +        EXECUTE
    +        $sql$
    +        SELECT avg( $sql$ || label_column_name || $sql$ ) FROM  pg_temp.knn_interm $sql$
    +        INTO l;
    +        END IF;
    +	EXECUTE 'INSERT INTO pg_temp.knn_label values($1,$2)'
    +	USING cur_pid, l;
    +    END LOOP;
    +    
    +    EXECUTE
    +        $sql$
    +	DROP TABLE IF EXISTS public.knn_final;
    +        CREATE TABLE public.knn_final AS
    --- End diff --
    
    Why do we need both `pg_temp.knn_label` and `public.knn_final`?
    Can't we just use the latter directly instead of knn_label?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/80#discussion_r92720295
  
    --- Diff: src/ports/postgres/modules/knn/knn.sql_in ---
    @@ -0,0 +1,126 @@
    +/* ----------------------------------------------------------------------- *//**
    + *
    + * @file knn.sql_in
    + *
    + * @brief Set of functions for k-nearest neighbors.
    + *
    + *
    + *//* ----------------------------------------------------------------------- */
    +
    +m4_include(`SQLCommon.m4')
    +
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.knn_result CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.knn_result AS (
    +    prediction float
    +);
    +DROP TYPE IF EXISTS MADLIB_SCHEMA.test_table_spec CASCADE;
    +CREATE TYPE MADLIB_SCHEMA.test_table_spec AS (
    +    id integer,
    +    vector DOUBLE PRECISION[]
    +);
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.__knn_validate_src(
    +rel_source VARCHAR
    +) RETURNS VOID AS $$
    +    PythonFunction(knn, knn, knn_validate_src)
    +$$ LANGUAGE plpythonu
    +m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `READS SQL DATA', `');
    +
    +
    +
    +CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.knn(
    +    point_source VARCHAR,
    +    point_column_name VARCHAR,
    +    label_column_name VARCHAR,
    +    test_source VARCHAR,
    +    test_column_name VARCHAR,
    +    id_column_name VARCHAR,
    +    operation VARCHAR,
    +    k INTEGER
    +) RETURNS VARCHAR AS $$
    +DECLARE
    +    class_test_source REGCLASS;
    +    class_point_source REGCLASS;
    +    l FLOAT;
    +    id INTEGER;
    +    vector DOUBLE PRECISION[];
    +    cur_pid integer;
    +    theResult MADLIB_SCHEMA.knn_result;
    +    r MADLIB_SCHEMA.test_table_spec;
    +    oldClientMinMessages VARCHAR;
    +    returnstring VARCHAR;
    +BEGIN
    +    oldClientMinMessages :=
    +        (SELECT setting FROM pg_settings WHERE name = 'client_min_messages');
    +    EXECUTE 'SET client_min_messages TO warning';
    +    PERFORM MADLIB_SCHEMA.__knn_validate_src(test_source);
    +    PERFORM MADLIB_SCHEMA.__knn_validate_src(point_source);
    +    class_test_source := test_source;
    +    class_point_source := point_source;
    +    --checks
    +    IF (k <= 0) THEN
    +        RAISE EXCEPTION 'KNN error: Number of neighbors k must be a positive integer.';
    +    END IF;
    +    IF (operation != 'c' AND operation != 'r') THEN
    +        RAISE EXCEPTION 'KNN error: put r for regression OR c for classification.';
    +    END IF;
    +    PERFORM MADLIB_SCHEMA.create_schema_pg_temp();
    +    EXECUTE
    +        $sql$
    +    	DROP TABLE IF EXISTS pg_temp.knn_label;
    +    	CREATE TABLE pg_temp.knn_label(pid integer, predlabel float);
    +	$sql$;
    +    
    +    --FOR r IN EXECUTE format('SELECT * FROM %I', test_source)
    +    FOR r IN EXECUTE format('SELECT %I,%I FROM %I', id_column_name, test_column_name, test_source)
    +    LOOP
    +        
    +	--RAISE NOTICE 'Original: %',r.pid;
    +	--RAISE NOTICE 'Original: %',r.p;
    +	cur_pid := r.id;
    +	vector := r.vector;
    +	EXECUTE
    +        $sql$
    +	DROP TABLE IF EXISTS pg_temp.knn_vector;
    +        CREATE TABLE pg_temp.knn_vector(vec DOUBLE PRECISION[]);
    +	$sql$;
    +	EXECUTE 'INSERT INTO pg_temp.knn_vector values($1)'
    +	USING vector;
    +	EXECUTE
    +        $sql$
    +	DROP TABLE IF EXISTS pg_temp.knn_interm;
    +        --CREATE TABLE pg_temp.knn_interm(dist DOUBLE PRECISION, lable integer );
    +	CREATE TABLE pg_temp.knn_interm AS
    +	
    +        SELECT madlib.squared_dist_norm2($sql$ || point_column_name || $sql$, vec) as dist, $sql$ || label_column_name || $sql$ FROM $sql$ || textin(regclassout(point_source)) || $sql$, knn_vector order by dist limit $sql$ || k;
    +	IF (operation = 'c') THEN
    +    	EXECUTE
    +        $sql$
    +        SELECT mode() within group (order by $sql$ || label_column_name || $sql$) FROM  pg_temp.knn_interm $sql$
    +    	INTO l;
    +        ELSE
    +        EXECUTE
    +        $sql$
    +        SELECT avg( $sql$ || label_column_name || $sql$ ) FROM  pg_temp.knn_interm $sql$
    +        INTO l;
    +        END IF;
    +	EXECUTE 'INSERT INTO pg_temp.knn_label values($1,$2)'
    +	USING cur_pid, l;
    +    END LOOP;
    +    
    +    EXECUTE
    +        $sql$
    +	DROP TABLE IF EXISTS public.knn_final;
    --- End diff --
    
    It might be better not to drop an existing output table. Instead,
    we can either take the name of the output table as a parameter,
    or we can append the string `"_output"` to the `test_source`
    parameter. Of course, we still shouldn't drop the table, if the name
    we end up constructing with that append results in a table name
    that already exists. The function should throw an error saying the
    output table already exists. This behavior will be in line with most
    other MADlib functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #80: KNN Added

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/80#discussion_r92721150
  
    --- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
    @@ -0,0 +1,6 @@
    +m4_include(`SQLCommon.m4')
    +/* -----------------------------------------------------------------------------
    + * Test knn.
    + *
    + * FIXME: Verify results
    + * -------------------------------------------------------------------------- */
    --- End diff --
    
    Please go ahead and add few test cases, that would
    check if knn is actually running as expected. Use small toy
    data sets to do the same. We can run install checks to
    perform some basic sanity checks of the module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---