You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by nj...@apache.org on 2018/11/16 00:08:08 UTC
madlib git commit: Minibatch Preprocessor: Update online doc
Repository: madlib
Updated Branches:
refs/heads/master bc8aeeb11 -> 25d716328
Minibatch Preprocessor: Update online doc
The online doc is outdated. This commit adds two new parameters that
have been introduced since the last time the doc was edited.
Closes #334
Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/25d71632
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/25d71632
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/25d71632
Branch: refs/heads/master
Commit: 25d71632816a8630aeeeff614747527346b891f3
Parents: bc8aeeb
Author: Nandish Jayaram <nj...@apache.org>
Authored: Tue Oct 23 10:35:02 2018 -0700
Committer: Nandish Jayaram <nj...@apache.org>
Committed: Thu Nov 15 16:07:34 2018 -0800
----------------------------------------------------------------------
.../utilities/minibatch_preprocessing.py_in | 24 +++++++++++++++-----
.../utilities/minibatch_preprocessing.sql_in | 2 +-
2 files changed, 19 insertions(+), 7 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/madlib/blob/25d71632/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in b/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
index 0762a06..88433c9 100644
--- a/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
+++ b/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in
@@ -487,10 +487,16 @@ class MiniBatchDocumentation:
----------------------------------------------------------------
SUMMARY
----------------------------------------------------------------
- MiniBatch Preprocessor is a utility function to pre process the input
- data for use with models that support mini-batching as an optimization
+ The mini-batch preprocessor is a utility that prepares input data for
+ use by models that support mini-batch as an optimization option. (This
+ is currently only the case for Neural Networks.) It is effectively a
+ packing operation that builds arrays of dependent and independent
+ variables from the source data table.
- #TODO add more here
+ The advantage of using mini-batching is that it can perform better than
+ stochastic gradient descent (default MADlib optimizer) because it uses
+ more than one training example at a time, typically resulting in faster
+ and smoother convergence.
For more details on function usage:
SELECT {schema_madlib}.{method}('usage')
@@ -508,8 +514,13 @@ class MiniBatchDocumentation:
dependent_varname, -- TEXT. Name of the dependent variable column
independent_varname, -- TEXT. Name of the independent variable
column
- buffer_size -- INTEGER. Number of source input rows to
- pack into batch
+ grouping_col -- TEXT. Default NULL. An expression list used
+ to group the input dataset into discrete groups
+ buffer_size -- INTEGER. Default computed automatically.
+ Number of source input rows to pack into a buffer
+ one_hot_encode_int_dep_var -- BOOLEAN. Default FALSE. Flag to one-hot
+ encode dependent variables that are
+ scalar integers
);
@@ -519,10 +530,11 @@ class MiniBatchDocumentation:
The output table produced by MiniBatch Preprocessor contains the
following columns:
- id -- INTEGER. Unique id for packed table.
+ __id__ -- INTEGER. Unique id for packed table.
dependent_varname -- FLOAT8[]. Packed array of dependent variables.
independent_varname -- FLOAT8[]. Packed array of independent
variables.
+ grouping_cols -- TEXT. Name of grouping columns.
---------------------------------------------------------------------------
The algorithm also creates a summary table named <output_table>_summary
http://git-wip-us.apache.org/repos/asf/madlib/blob/25d71632/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in b/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
index 1ac00fb..58668a1 100644
--- a/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
+++ b/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in
@@ -46,7 +46,7 @@ arrays of dependent and independent variables from the source data table.
The advantage of using mini-batching is that it can perform better than
stochastic gradient descent (default MADlib optimizer) because it
uses more than one training
-example at a time, typically resulting faster and smoother convergence [1].
+example at a time, typically resulting in faster and smoother convergence [1].
@brief
Utility that prepares input data for use by models that support