You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Nandish Jayaram (JIRA)" <ji...@apache.org> on 2019/04/26 21:41:00 UTC

[jira] [Updated] (MADLIB-1333) DL: Add new function for preprocessing images for validation dataset

     [ https://issues.apache.org/jira/browse/MADLIB-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nandish Jayaram updated MADLIB-1333:
------------------------------------
    Description: 
Function to prepare the validation dataset for deep learning with madlib
 * This function assumes that the pre processor for training data has already been run.
 * mini-batch x and y.
 * 1-hot encode class levels (for 1-hot) - want to make sure don't miss any class levels (in the case that validation data set by itself does not have all class values that are in the training dataset). This value will be read from the output of the summary table for pre processor for training data.
 * normalizing: use the same normalizing constant that was used while creating batched training data, found in its summary table.
 * rename x and y so that the column names for training data and validation data are the same.
 * applies to fit() and evaluate()

Proposed Interface:
 Rename `minibatch_preprocessor_dl` to `training_preprocessor_dl`. Interface is the same as in master currently:
{code:java}
training_preprocessor_dl( source_table,  -- training dataset
                          output_table,
                          dependent_varname,
                          independent_varname,
                          buffer_size,           	-- Optional
                          normalizing_const,		-- Optional
                          num_classes				-- Optional
                        )
{code}
New function for preparing validation data for evaluation:
{code:java}
validation_preprocessor_dl(
      source_table,  -- validation dataset
      output_table,  
      dependent_varname,
      independent_varname,
      training_preprocessor_table,  -- i.e., from training_preprocessor_dl
      buffer_size           	-- Optional
)
{code}
Note:
 1. {{validation_preprocessor_dl}} does not need to randomize.

Acceptance:
 1. Input validation check to ensure `training_preprocessor_table` is not null.
 2. Run validation_preprocessor_dl and training_preprocessor_dl on some toy data sets of 5-10 fake images of low res, e.g., 2x2. Manually check that both sets are normalized the same and 1-hot encoded the same and all present in the output tables (except ordering will be diff of course since training data is randomized and val data is not).
 3. Make the `buffer_size` in `validation_preprocessor_dl` <1 and ensure fails with nice error message.

  was:
Function to prepare the validation dataset for deep learning with madlib
 * This function assumes that the pre processor for training data has already been run.
 * mini-batch x and y.
 * 1-hot encode class levels (for 1-hot) - want to make sure don't miss any class levels (in the case that validation data set by itself does not have all class values that are in the training dataset). This value will be read from the output of the summary table for pre processor for training data.
 * normalizing: use the same normalizing constant that was used while creating batched training data, found in its summary table.
 * rename x and y so that the column names for training data and validation data are the same.
 * applies to fit() and evaluate()

Proposed Interface:
 Rename `minibatch_preprocessor_dl` to `training_preprocessor_dl`. Interface is the same as in master currently:
{code:java}
training_preprocessor_dl( source_table,  -- training dataset
                          output_table,
                          dependent_varname,
                          independent_varname,
                          buffer_size,           	-- Optional
                          normalizing_const,		-- Optional
                          num_classes				-- Optional
                        )
{code}
New function for preparing validation data for evaluation:
{code:java}
validation_preprocessor_dl(
      source_table,  -- validation dataset
      output_table,  
      dependent_varname,
      independent_varname,
      training_preprocessor_table,  -- i.e., from training_preprocessor_dl
      buffer_size           	-- Optional
)
{code}
Acceptance:
 1. Input validation check to ensure `training_preprocessor_table` is not null.
 2. Run `validation_preprocessor_dl` on the exact same data set as `training_preprocessor_dl` and ensure that respective output tables are the same element-by-element. This test may only be verifiable if there was exactly one image in the input table.
3. Make the `buffer_size` in `validation_preprocessor_dl` <1 and ensure fails with nice error message.


> DL: Add new function for preprocessing images for validation dataset
> --------------------------------------------------------------------
>
>                 Key: MADLIB-1333
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1333
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Nandish Jayaram
>            Priority: Major
>
> Function to prepare the validation dataset for deep learning with madlib
>  * This function assumes that the pre processor for training data has already been run.
>  * mini-batch x and y.
>  * 1-hot encode class levels (for 1-hot) - want to make sure don't miss any class levels (in the case that validation data set by itself does not have all class values that are in the training dataset). This value will be read from the output of the summary table for pre processor for training data.
>  * normalizing: use the same normalizing constant that was used while creating batched training data, found in its summary table.
>  * rename x and y so that the column names for training data and validation data are the same.
>  * applies to fit() and evaluate()
> Proposed Interface:
>  Rename `minibatch_preprocessor_dl` to `training_preprocessor_dl`. Interface is the same as in master currently:
> {code:java}
> training_preprocessor_dl( source_table,  -- training dataset
>                           output_table,
>                           dependent_varname,
>                           independent_varname,
>                           buffer_size,           	-- Optional
>                           normalizing_const,		-- Optional
>                           num_classes				-- Optional
>                         )
> {code}
> New function for preparing validation data for evaluation:
> {code:java}
> validation_preprocessor_dl(
>       source_table,  -- validation dataset
>       output_table,  
>       dependent_varname,
>       independent_varname,
>       training_preprocessor_table,  -- i.e., from training_preprocessor_dl
>       buffer_size           	-- Optional
> )
> {code}
> Note:
>  1. {{validation_preprocessor_dl}} does not need to randomize.
> Acceptance:
>  1. Input validation check to ensure `training_preprocessor_table` is not null.
>  2. Run validation_preprocessor_dl and training_preprocessor_dl on some toy data sets of 5-10 fake images of low res, e.g., 2x2. Manually check that both sets are normalized the same and 1-hot encoded the same and all present in the output tables (except ordering will be diff of course since training data is randomized and val data is not).
>  3. Make the `buffer_size` in `validation_preprocessor_dl` <1 and ensure fails with nice error message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)