You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by hpandeycodeit <gi...@git.apache.org> on 2017/08/16 21:53:54 UTC
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
GitHub user hpandeycodeit opened a pull request:
https://github.com/apache/incubator-madlib/pull/168
Code refactoring for KNN
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hpandeycodeit/incubator-madlib knn_refactor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/168.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #168
----
commit dbb7199ee461683f432d22b8fdb87d519ca83f13
Author: hpandeycodeit <hp...@pivotal.io>
Date: 2017-08-16T21:48:36Z
Code refactoring for KNN
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-madlib/pull/168
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib issue #168: Code refactoring for KNN
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/incubator-madlib/pull/168
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/160/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib issue #168: Code refactoring for KNN
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/incubator-madlib/pull/168
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/179/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r133628535
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
--- End diff --
Missing details for `operation`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib issue #168: Code refactoring for KNN
Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on the issue:
https://github.com/apache/incubator-madlib/pull/168
jenkins ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r134070772
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
--- End diff --
added operation details.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib issue #168: Code refactoring for KNN
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/incubator-madlib/pull/168
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/193/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r134070784
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
+ @param k default: 1. Number of nearest neighbors to consider
+
+
+ Returns:
+ VARCHAR Name of the output table.
+ """
+
+
+ oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
--- End diff --
This is also done.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib issue #168: Code refactoring for KNN
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/incubator-madlib/pull/168
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/184/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r134070999
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
+ @param k default: 1. Number of nearest neighbors to consider
+
+
+ Returns:
+ VARCHAR Name of the output table.
+ """
+
+
+ oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
+
+ plpy.execute("SET client_min_messages TO warning");
+
+
+ k_val = knn_validate_src(schema_madlib, point_source, point_column_name,
+ label_column_name, test_source,
+ test_column_name, id_column_name,
+ output_table, operation, k)
+
+
+ plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
+
+ x_temp_table = unique_string(desp='x_temp_table')
+ y_temp_table = unique_string(desp='y_temp_table')
+ label_column_name_unique = unique_string(desp='label_column_name_unique')
+ test_id = unique_string(desp='test_id')
+
+ convert_boolean_to_int = '';
+ if operation == 'c':
+ convert_boolean_to_int = '::INTEGER';
+
+ madlib_knn_interm = unique_string(desp='madlib_knn_interm')
+
+ plpy.execute("""DROP TABLE IF EXISTS pg_temp.{madlib_knn_interm}""".format(**locals()));
+ plpy.execute(
+ """
+ CREATE TEMP TABLE pg_temp.{madlib_knn_interm} AS
+ SELECT *
+ FROM
+ (
+ SELECT row_number() over (partition by {test_id} order by dist) AS r , {x_temp_table}.*
+ FROM
+ (
+ SELECT test.{id_column_name} AS {test_id} , {schema_madlib}.squared_dist_norm2(train.{point_column_name} ,test.{test_column_name}) AS dist, train.{label_column_name} {convert_boolean_to_int} AS {label_column_name_unique}
+ FROM {point_source} AS train, {test_source} AS test
+ ) {x_temp_table}
+ ){y_temp_table}
+ WHERE {y_temp_table}.r <= {k_val}""".format(**locals()));
+
+ if operation == 'c':
+ plpy.execute(
+ """
+ CREATE TABLE {output_table} AS
+ SELECT {test_id} AS id, {test_column_name} , {schema_madlib}.mode({label_column_name_unique}) AS prediction
+ FROM pg_temp.{madlib_knn_interm} join {test_source} ON {test_id} = {id_column_name}
+ GROUP BY {test_id} , {test_column_name}""".format(**locals()))
+
+
+ else:
+ plpy.execute(
+ """
+ CREATE TABLE {output_table} AS
+ SELECT {test_id} AS id, {test_column_name} , avg( {label_column_name_unique} ) AS prediction
+ FROM
+ pg_temp.{madlib_knn_interm} join {test_source} on {test_id} ={id_column_name}
+ GROUP BY {test_id} , {test_column_name}
+ ORDER BY {test_id}""".format(**locals()))
+
+
+ plpy.execute("SET client_min_messages TO "+ oldClientMinMessages)
+
+ if operation == 'c':
+ returnstring = 'The classification results have been written to output table '+ output_table;
+ else:
+ returnstring = 'The regression results have been written to output table '+ output_table;
+
+ plpy.execute("""DROP TABLE pg_temp.{madlib_knn_interm}""".format(**locals()));
+
+ return returnstring;
+
+
--- End diff --
I did some changes as per the style guide. Will fix it all in a day or two
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r133629865
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
+ @param k default: 1. Number of nearest neighbors to consider
+
+
+ Returns:
+ VARCHAR Name of the output table.
+ """
+
+
+ oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
+
+ plpy.execute("SET client_min_messages TO warning");
+
+
+ k_val = knn_validate_src(schema_madlib, point_source, point_column_name,
+ label_column_name, test_source,
+ test_column_name, id_column_name,
+ output_table, operation, k)
+
+
+ plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
+
+ x_temp_table = unique_string(desp='x_temp_table')
+ y_temp_table = unique_string(desp='y_temp_table')
+ label_column_name_unique = unique_string(desp='label_column_name_unique')
+ test_id = unique_string(desp='test_id')
+
+ convert_boolean_to_int = '';
+ if operation == 'c':
+ convert_boolean_to_int = '::INTEGER';
+
+ madlib_knn_interm = unique_string(desp='madlib_knn_interm')
+
+ plpy.execute("""DROP TABLE IF EXISTS pg_temp.{madlib_knn_interm}""".format(**locals()));
+ plpy.execute(
+ """
+ CREATE TEMP TABLE pg_temp.{madlib_knn_interm} AS
+ SELECT *
+ FROM
+ (
+ SELECT row_number() over (partition by {test_id} order by dist) AS r , {x_temp_table}.*
+ FROM
+ (
+ SELECT test.{id_column_name} AS {test_id} , {schema_madlib}.squared_dist_norm2(train.{point_column_name} ,test.{test_column_name}) AS dist, train.{label_column_name} {convert_boolean_to_int} AS {label_column_name_unique}
+ FROM {point_source} AS train, {test_source} AS test
+ ) {x_temp_table}
+ ){y_temp_table}
+ WHERE {y_temp_table}.r <= {k_val}""".format(**locals()));
+
+ if operation == 'c':
+ plpy.execute(
+ """
+ CREATE TABLE {output_table} AS
+ SELECT {test_id} AS id, {test_column_name} , {schema_madlib}.mode({label_column_name_unique}) AS prediction
+ FROM pg_temp.{madlib_knn_interm} join {test_source} ON {test_id} = {id_column_name}
+ GROUP BY {test_id} , {test_column_name}""".format(**locals()))
+
+
+ else:
+ plpy.execute(
+ """
+ CREATE TABLE {output_table} AS
+ SELECT {test_id} AS id, {test_column_name} , avg( {label_column_name_unique} ) AS prediction
+ FROM
+ pg_temp.{madlib_knn_interm} join {test_source} on {test_id} ={id_column_name}
+ GROUP BY {test_id} , {test_column_name}
+ ORDER BY {test_id}""".format(**locals()))
+
+
+ plpy.execute("SET client_min_messages TO "+ oldClientMinMessages)
+
+ if operation == 'c':
+ returnstring = 'The classification results have been written to output table '+ output_table;
+ else:
+ returnstring = 'The regression results have been written to output table '+ output_table;
+
+ plpy.execute("""DROP TABLE pg_temp.{madlib_knn_interm}""".format(**locals()));
+
+ return returnstring;
+
+
--- End diff --
In general, file contains multiple style-guide and PEP8 issues. Please refer to [page in wiki for appropriate style](https://cwiki.apache.org/confluence/display/MADLIB/Python+Style+Guide).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r133628699
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
+ @param k default: 1. Number of nearest neighbors to consider
+
+
+ Returns:
+ VARCHAR Name of the output table.
+ """
+
+
+ oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
+
+ plpy.execute("SET client_min_messages TO warning");
+
+
+ k_val = knn_validate_src(schema_madlib, point_source, point_column_name,
+ label_column_name, test_source,
+ test_column_name, id_column_name,
+ output_table, operation, k)
+
+
+ plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
+
+ x_temp_table = unique_string(desp='x_temp_table')
+ y_temp_table = unique_string(desp='y_temp_table')
+ label_column_name_unique = unique_string(desp='label_column_name_unique')
+ test_id = unique_string(desp='test_id')
+
+ convert_boolean_to_int = '';
+ if operation == 'c':
--- End diff --
Since this comparison is used multiple times, better to create a boolean flag that is equal to this comparison.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by hpandeycodeit <gi...@git.apache.org>.
Github user hpandeycodeit commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r134070881
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
+ @param k default: 1. Number of nearest neighbors to consider
+
+
+ Returns:
+ VARCHAR Name of the output table.
+ """
+
+
+ oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
+
+ plpy.execute("SET client_min_messages TO warning");
+
+
+ k_val = knn_validate_src(schema_madlib, point_source, point_column_name,
+ label_column_name, test_source,
+ test_column_name, id_column_name,
+ output_table, operation, k)
+
+
+ plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".format(schema_madlib = schema_madlib));
+
+ x_temp_table = unique_string(desp='x_temp_table')
+ y_temp_table = unique_string(desp='y_temp_table')
+ label_column_name_unique = unique_string(desp='label_column_name_unique')
+ test_id = unique_string(desp='test_id')
+
+ convert_boolean_to_int = '';
+ if operation == 'c':
--- End diff --
yes, I used created a boolean flag for this comparison.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request #168: Code refactoring for KNN
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/168#discussion_r133628425
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -127,5 +124,102 @@ def knn_validate_src(schema_madlib, point_source, point_column_name, label_colum
"Data type '{0}' is not a valid type for column '{1}' in table '{2}'.".format(colType, id_column_name, test_source))
return k
-# ----------------------------------------------------------------------
-m4_changequote(<!`!>, <!'!>)
+
+
+
+
+def knn(schema_madlib, point_source, point_column_name, label_column_name,
+ test_source, test_column_name, id_column_name, output_table, operation, k):
+
+ """
+ KNN function to find the K Nearest neighbours
+ Args:
+ @param schema_madlib Name of the Madlib Schema
+ @param point_source Training data table
+ @param point_column_name Name of the column with training data points.
+ @param label_column_name Name of the column with labels/values of training data points.
+ @param test_source Name of the table containing the test data points.
+ @param test_column_name Name of the column with testing data points.
+ @param id_column_name Name of the column having ids of data points in test data table.
+ @param output_table Name of the table to store final results.
+ @param k default: 1. Number of nearest neighbors to consider
+
+
+ Returns:
+ VARCHAR Name of the output table.
+ """
+
+
+ oldClientMinMessages = plpy.execute("SELECT setting FROM pg_settings WHERE name = 'client_min_messages'")[0]['setting'];
--- End diff --
Better to use the context manager: `with MinWarning('warning'): `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib issue #168: Code refactoring for KNN
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:
https://github.com/apache/incubator-madlib/pull/168
Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/161/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---