You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by nj...@apache.org on 2018/11/16 00:08:08 UTC

madlib git commit: Minibatch Preprocessor: Update online doc

Repository: madlib
Updated Branches:
  refs/heads/master bc8aeeb11 -> 25d716328


Minibatch Preprocessor: Update online doc

The online doc is outdated. This commit adds two new parameters that
have been introduced since the last time the doc was edited.

Closes #334


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/25d71632
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/25d71632
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/25d71632

Branch: refs/heads/master
Commit: 25d71632816a8630aeeeff614747527346b891f3
Parents: bc8aeeb
Author: Nandish Jayaram <nj...@apache.org>
Authored: Tue Oct 23 10:35:02 2018 -0700
Committer: Nandish Jayaram <nj...@apache.org>
Committed: Thu Nov 15 16:07:34 2018 -0800

----------------------------------------------------------------------
 .../utilities/minibatch_preprocessing.py_in     | 24 +++++++++++++++-----
 .../utilities/minibatch_preprocessing.sql_in    |  2 +-
 2 files changed, 19 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/madlib/blob/25d71632/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in b/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
index 0762a06..88433c9 100644
--- a/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
+++ b/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
         ----------------------------------------------------------------
                             SUMMARY
         ----------------------------------------------------------------
-        MiniBatch Preprocessor is a utility function to pre process the input
-        data for use with models that support mini-batching as an optimization
+        The mini-batch preprocessor is a utility that prepares input data for
+        use by models that support mini-batch as an optimization option. (This
+        is currently only the case for Neural Networks.) It is effectively a
+        packing operation that builds arrays of dependent and independent
+        variables from the source data table.
 
-        #TODO add more here
+        The advantage of using mini-batching is that it can perform better than
+        stochastic gradient descent (default MADlib optimizer) because it uses
+        more than one training example at a time, typically resulting in faster
+        and smoother convergence.
 
         For more details on function usage:
         SELECT {schema_madlib}.{method}('usage')
@@ -508,8 +514,13 @@ class MiniBatchDocumentation:
             dependent_varname,     -- TEXT. Name of the dependent variable column
             independent_varname,   -- TEXT. Name of the independent variable
                                       column
-            buffer_size            -- INTEGER. Number of source input rows to
-                                      pack into batch
+            grouping_col           -- TEXT. Default NULL. An expression list used
+                                      to group the input dataset into discrete groups
+            buffer_size            -- INTEGER. Default computed automatically.
+                                      Number of source input rows to pack into a buffer
+            one_hot_encode_int_dep_var -- BOOLEAN. Default FALSE. Flag to one-hot
+                                          encode dependent variables that are
+                                          scalar integers
         );
 
 
@@ -519,10 +530,11 @@ class MiniBatchDocumentation:
         The output table produced by MiniBatch Preprocessor contains the
         following columns:
 
-        id					    -- INTEGER.  Unique id for packed table.
+        __id__				    -- INTEGER.  Unique id for packed table.
         dependent_varname 		-- FLOAT8[]. Packed array of dependent variables.
         independent_varname		-- FLOAT8[]. Packed array of independent
                                    variables.
+        grouping_cols           -- TEXT. Name of grouping columns.
 
         ---------------------------------------------------------------------------
         The algorithm also creates a summary table named <output_table>_summary

http://git-wip-us.apache.org/repos/asf/madlib/blob/25d71632/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in b/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
index 1ac00fb..58668a1 100644
--- a/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
+++ b/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
@@ -46,7 +46,7 @@ arrays of dependent and independent variables from the source data table.
 The advantage of using mini-batching is that it can perform better than
 stochastic gradient descent (default MADlib optimizer) because it
 uses more than one training
-example at a time, typically resulting faster and smoother convergence [1].
+example at a time, typically resulting in faster and smoother convergence [1].
 
 @brief
 Utility that prepares input data for use by models that support