You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by hpandeycodeit <gi...@git.apache.org> on 2017/08/16 21:53:54 UTC

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

GitHub user hpandeycodeit opened a pull request:

    https://github.com/apache/incubator-madlib/pull/168

    Code refactoring for KNN

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hpandeycodeit/incubator-madlib knn_refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/168.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #168
    
----
commit dbb7199ee461683f432d22b8fdb87d519ca83f13
Author: hpandeycodeit <hp...@pivotal.io>
Date:   2017-08-16T21:48:36Z

    Code refactoring for KNN

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-madlib/pull/168


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #168: Code refactoring for KNN

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/incubator-madlib/pull/168
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/160/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #168: Code refactoring for KNN

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/incubator-madlib/pull/168
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/179/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r133628535
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    --- End diff --
    
    Missing details for `operation`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #168: Code refactoring for KNN

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on the issue:

    https://github.com/apache/incubator-madlib/pull/168
  
    jenkins ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r134070772
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    --- End diff --
    
    added operation details. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #168: Code refactoring for KNN

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/incubator-madlib/pull/168
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/193/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r134070784
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    +            @param k                   default: 1. Number of nearest neighbors to consider
    +
    +
    +        Returns: 
    +            VARCHAR                     Name of the output table.             
    +    """                                
    +
    +  
    +    oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
    --- End diff --
    
    This is also done. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #168: Code refactoring for KNN

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/incubator-madlib/pull/168
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/184/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r134070999
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    +            @param k                   default: 1. Number of nearest neighbors to consider
    +
    +
    +        Returns: 
    +            VARCHAR                     Name of the output table.             
    +    """                                
    +
    +  
    +    oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
    +
    +    plpy.execute("SET client_min_messages TO warning");
    +
    + 
    +    k_val = knn_validate_src(schema_madlib, point_source, point_column_name, 
    +                label_column_name, test_source, 
    +                test_column_name, id_column_name, 
    +                output_table, operation, k) 
    +
    +
    +    plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
    + 
    +    x_temp_table = unique_string(desp='x_temp_table') 
    +    y_temp_table = unique_string(desp='y_temp_table') 
    +    label_column_name_unique = unique_string(desp='label_column_name_unique')  
    +    test_id = unique_string(desp='test_id')  
    +
    +    convert_boolean_to_int = '';
    +    if operation == 'c':
    +        convert_boolean_to_int = '::INTEGER';
    +    
    +    madlib_knn_interm = unique_string(desp='madlib_knn_interm')
    +
    +    plpy.execute("""DROP TABLE IF EXISTS pg_temp.{madlib_knn_interm}""".format(**locals()));
    +    plpy.execute(
    +    """
    +    CREATE TEMP TABLE pg_temp.{madlib_knn_interm} AS
    +    SELECT *
    +    FROM
    +        (
    +        SELECT row_number() over (partition by {test_id}  order by dist) AS r , {x_temp_table}.*
    +        FROM
    +            (
    +                SELECT test.{id_column_name} AS  {test_id} , {schema_madlib}.squared_dist_norm2(train.{point_column_name} ,test.{test_column_name}) AS dist, train.{label_column_name} {convert_boolean_to_int} AS {label_column_name_unique}
    +                FROM  {point_source} AS train, {test_source}  AS test
    +            ) {x_temp_table}
    +        ){y_temp_table}
    +    WHERE {y_temp_table}.r <= {k_val}""".format(**locals()));
    +
    +    if operation == 'c':
    +        plpy.execute(
    +        """
    +        CREATE TABLE {output_table} AS
    +        SELECT {test_id} AS id, {test_column_name} , {schema_madlib}.mode({label_column_name_unique}) AS prediction
    +        FROM pg_temp.{madlib_knn_interm} join  {test_source}  ON  {test_id} = {id_column_name}  
    +        GROUP BY {test_id}  ,  {test_column_name}""".format(**locals()))
    +        
    +        
    +    else:
    +        plpy.execute(
    +        """ 
    +        CREATE TABLE  {output_table} AS
    +        SELECT  {test_id}   AS id, {test_column_name} , avg( {label_column_name_unique}  ) AS prediction
    +        FROM
    +            pg_temp.{madlib_knn_interm} join {test_source}  on {test_id}  ={id_column_name} 
    +        GROUP BY {test_id} ,  {test_column_name} 
    +        ORDER BY {test_id}""".format(**locals()))   
    +   
    +
    +    plpy.execute("SET client_min_messages TO "+ oldClientMinMessages)
    +
    +    if operation == 'c':
    +        returnstring = 'The classification results have been written to output table '+ output_table;
    +    else:
    +        returnstring = 'The regression results have been written to output table '+ output_table;
    +
    +    plpy.execute("""DROP TABLE pg_temp.{madlib_knn_interm}""".format(**locals()));    
    +
    +    return returnstring;
    +
    +
    --- End diff --
    
    I did some changes as per the style guide. Will fix it all in a day or two


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r133629865
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    +            @param k                   default: 1. Number of nearest neighbors to consider
    +
    +
    +        Returns: 
    +            VARCHAR                     Name of the output table.             
    +    """                                
    +
    +  
    +    oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
    +
    +    plpy.execute("SET client_min_messages TO warning");
    +
    + 
    +    k_val = knn_validate_src(schema_madlib, point_source, point_column_name, 
    +                label_column_name, test_source, 
    +                test_column_name, id_column_name, 
    +                output_table, operation, k) 
    +
    +
    +    plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
    + 
    +    x_temp_table = unique_string(desp='x_temp_table') 
    +    y_temp_table = unique_string(desp='y_temp_table') 
    +    label_column_name_unique = unique_string(desp='label_column_name_unique')  
    +    test_id = unique_string(desp='test_id')  
    +
    +    convert_boolean_to_int = '';
    +    if operation == 'c':
    +        convert_boolean_to_int = '::INTEGER';
    +    
    +    madlib_knn_interm = unique_string(desp='madlib_knn_interm')
    +
    +    plpy.execute("""DROP TABLE IF EXISTS pg_temp.{madlib_knn_interm}""".format(**locals()));
    +    plpy.execute(
    +    """
    +    CREATE TEMP TABLE pg_temp.{madlib_knn_interm} AS
    +    SELECT *
    +    FROM
    +        (
    +        SELECT row_number() over (partition by {test_id}  order by dist) AS r , {x_temp_table}.*
    +        FROM
    +            (
    +                SELECT test.{id_column_name} AS  {test_id} , {schema_madlib}.squared_dist_norm2(train.{point_column_name} ,test.{test_column_name}) AS dist, train.{label_column_name} {convert_boolean_to_int} AS {label_column_name_unique}
    +                FROM  {point_source} AS train, {test_source}  AS test
    +            ) {x_temp_table}
    +        ){y_temp_table}
    +    WHERE {y_temp_table}.r <= {k_val}""".format(**locals()));
    +
    +    if operation == 'c':
    +        plpy.execute(
    +        """
    +        CREATE TABLE {output_table} AS
    +        SELECT {test_id} AS id, {test_column_name} , {schema_madlib}.mode({label_column_name_unique}) AS prediction
    +        FROM pg_temp.{madlib_knn_interm} join  {test_source}  ON  {test_id} = {id_column_name}  
    +        GROUP BY {test_id}  ,  {test_column_name}""".format(**locals()))
    +        
    +        
    +    else:
    +        plpy.execute(
    +        """ 
    +        CREATE TABLE  {output_table} AS
    +        SELECT  {test_id}   AS id, {test_column_name} , avg( {label_column_name_unique}  ) AS prediction
    +        FROM
    +            pg_temp.{madlib_knn_interm} join {test_source}  on {test_id}  ={id_column_name} 
    +        GROUP BY {test_id} ,  {test_column_name} 
    +        ORDER BY {test_id}""".format(**locals()))   
    +   
    +
    +    plpy.execute("SET client_min_messages TO "+ oldClientMinMessages)
    +
    +    if operation == 'c':
    +        returnstring = 'The classification results have been written to output table '+ output_table;
    +    else:
    +        returnstring = 'The regression results have been written to output table '+ output_table;
    +
    +    plpy.execute("""DROP TABLE pg_temp.{madlib_knn_interm}""".format(**locals()));    
    +
    +    return returnstring;
    +
    +
    --- End diff --
    
    In general, file contains multiple style-guide and PEP8 issues. Please refer to [page in wiki for appropriate style](https://cwiki.apache.org/confluence/display/MADLIB/Python+Style+Guide). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r133628699
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    +            @param k                   default: 1. Number of nearest neighbors to consider
    +
    +
    +        Returns: 
    +            VARCHAR                     Name of the output table.             
    +    """                                
    +
    +  
    +    oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
    +
    +    plpy.execute("SET client_min_messages TO warning");
    +
    + 
    +    k_val = knn_validate_src(schema_madlib, point_source, point_column_name, 
    +                label_column_name, test_source, 
    +                test_column_name, id_column_name, 
    +                output_table, operation, k) 
    +
    +
    +    plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
    + 
    +    x_temp_table = unique_string(desp='x_temp_table') 
    +    y_temp_table = unique_string(desp='y_temp_table') 
    +    label_column_name_unique = unique_string(desp='label_column_name_unique')  
    +    test_id = unique_string(desp='test_id')  
    +
    +    convert_boolean_to_int = '';
    +    if operation == 'c':
    --- End diff --
    
    Since this comparison is used multiple times, better to create a boolean flag that is equal to this comparison. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r134070881
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    +            @param k                   default: 1. Number of nearest neighbors to consider
    +
    +
    +        Returns: 
    +            VARCHAR                     Name of the output table.             
    +    """                                
    +
    +  
    +    oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
    +
    +    plpy.execute("SET client_min_messages TO warning");
    +
    + 
    +    k_val = knn_validate_src(schema_madlib, point_source, point_column_name, 
    +                label_column_name, test_source, 
    +                test_column_name, id_column_name, 
    +                output_table, operation, k) 
    +
    +
    +    plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
    + 
    +    x_temp_table = unique_string(desp='x_temp_table') 
    +    y_temp_table = unique_string(desp='y_temp_table') 
    +    label_column_name_unique = unique_string(desp='label_column_name_unique')  
    +    test_id = unique_string(desp='test_id')  
    +
    +    convert_boolean_to_int = '';
    +    if operation == 'c':
    --- End diff --
    
    yes, I used created a boolean flag for this comparison. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request #168: Code refactoring for KNN

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/168#discussion_r133628425
  
    --- Diff: src/ports/postgres/modules/knn/knn.py_in ---
    @@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
                         "Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
         return k
     
    -# ----------------------------------------------------------------------
    -m4_changequote(<!`!>, <!'!>)
    +
    +
    +
    +
    +def knn(schema_madlib, point_source, point_column_name, label_column_name,
    +    test_source, test_column_name, id_column_name, output_table, operation, k):
    +
    +    """
    +        KNN function to find the K Nearest neighbours
    +        Args:
    +            @param schema_madlib       Name of the Madlib Schema
    +            @param point_source        Training data table 
    +            @param point_column_name   Name of the column with training data points.
    +            @param label_column_name   Name of the column with labels/values of training data points.
    +            @param test_source         Name of the table containing the test data points.
    +            @param test_column_name    Name of the column with testing data points.
    +            @param id_column_name      Name of the column having ids of data points in test data table.
    +            @param output_table        Name of the table to store final results.
    +            @param k                   default: 1. Number of nearest neighbors to consider
    +
    +
    +        Returns: 
    +            VARCHAR                     Name of the output table.             
    +    """                                
    +
    +  
    +    oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
    --- End diff --
    
    Better to use the context manager: `with MinWarning('warning'): `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib issue #168: Code refactoring for KNN

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/incubator-madlib/pull/168
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/161/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---