You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/01/24 00:14:00 UTC
[jira] [Created] (MADLIB-1200) Pre-processing helper function for
mini-batching
Frank McQuillan created MADLIB-1200:
---------------------------------------
Summary: Pre-processing helper function for mini-batching
Key: MADLIB-1200
URL: https://issues.apache.org/jira/browse/MADLIB-1200
Project: Apache MADlib
Issue Type: New Feature
Components: Module: Utilities
Reporter: Frank McQuillan
Fix For: v1.14
Related to
https://issues.apache.org/jira/browse/MADLIB-1037
https://issues.apache.org/jira/browse/MADLIB-1048
Story
{{As a}}
data scientist
{{I want to}}
pre-process input files for use with mini-batching
{{so that}}
the optimization part of MLP, SVM, etc. runs faster when I do multiple runs, perhaps because I am tuning parameters (i.e., pre-processing is a 1-time operation)
Interface
This function is kind of the inverse of:
array_unnest_2d_to_1d()
http://madlib.apache.org/docs/latest/array__ops_8sql__in.html#af057b589f2a2cb1095caa99feaeb3d70
but the difference is we want to persist an output table for the packed 2d array.
Suggested interface:
matrix_nest_1d_to_2d (
source_table,
output_table,
independent_varname,
dependent_varname,
n_elements — Number of elements to pack
);
where dependent_varname is a column of 1d arrays
Or call it
array_nest_1d_to_2d ()
?
Notes
1) Random shuffle needed for mini-batch.
2) Naive approach may be OK to start, not worth big investment to make run 10% or 20% faster.
Acceptance
1) Convert from standard to special format for mini-batching
2) Some scale testing OK (does not need to be comprehensive)
3) Document as a helper function user docs
4) IC
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)