You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/06/21 22:32:00 UTC

[jira] [Updated] (MADLIB-1364) Misc message and other items for 1.16 release

     [ https://issues.apache.org/jira/browse/MADLIB-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan updated MADLIB-1364:
------------------------------------
    Priority: Minor  (was: Major)

> Misc message and other items for 1.16 release
> ---------------------------------------------
>
>                 Key: MADLIB-1364
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1364
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.16
>
>
> (1)
> input shape checking
> We added input shape checking which is a good idea in principle, but it seems to be too restrictive. e.g., for the mnist data set, Keras input shape is:
> {code}
> x_train_lt5.shape
> (30596, 28, 28)
> {code}
> In Madlib after preprocessing we get:
> {code}
> id | 2238
> x  | {{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,196,195,12,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,79,159,44,0,0,0,0,39,253,218,10,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,179,0,0,0,0,149,253,169,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,53,0,0,0,12,222,253,123,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,8,226,253,16,0,0,0,25,253,253,56,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,50,253,253,16,0,0,0,41,253,218,7,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,139,253,217,8,0,0,0,126,253,193,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,213,253,114,0,0,0,10,226,253,130,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,39,250,253,223,10,0,0,17,253,253,54,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,173,253,253,253,169,137,83,120,253,221,2,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,52,238,254,254,254,254,254,255,254,254,192,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,115,253,228,84,73,97,154,238,253,253,138,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,40,146,45,0,0,0,0,9,253,250,73,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,132,253,186,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,243,253,102,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,196,254,238,7,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,245,254,186,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,166,251,79,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}
> y  | 4
> {code}
> A validation error gets thrown when we run fit():
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: model_keras error: Input shape [28, 28, 1] in the model architecture does not match the input shape [28, 28, None] of column independent_var in table train_lt5_packed. (plpython.c:5038)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "madlib_keras_fit", line 21, in <module>
>     madlib_keras.fit(**globals())
>   PL/Python function "madlib_keras_fit", line 42, in wrapper
>   PL/Python function "madlib_keras_fit", line 102, in fit
>   PL/Python function "madlib_keras_fit", line 300, in validate_input_shapes
>   PL/Python function "madlib_keras_fit", line 86, in _validate_input_shapes
> PL/Python function "madlib_keras_fit"
>  [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- source table\n                               'mnist_model',         -- model output table\n                               'model_arch_library',  -- model arch table\n                                1,                    -- model arch id\n                                $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n                                $$ batch_size=batch_size, epochs=1 $$,  -- fit_params\n                                5,                    -- num_iterations\n                                0,                    -- gpus_per_host\n                                'test_lt5_packed',           -- validation table\n                                1                     -- metrics_compute_frequency\n                              );"]
> {code}
> which is too restrictive.  I suggest we turn madlib input shape validation off for the time being and let the back end fail or not according to its rules.  This applies to fit, evaluate and predict.
> (2)
> confusing error message if forgot to preprocess source table
> {code}
> SELECT madlib.madlib_keras_fit('train_lt5',           -- source table (NOT PREPROCESSED)
>                                'mnist_model',         -- model output table
>                                'model_arch_library',  -- model arch table
>                                 1,                    -- model arch id
>                                 $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params
>                                 $$ batch_size=batch_size, epochs=1 $$,  -- fit_params
>                                 5,                    -- num_iterations
>                                 0,                    -- gpus_per_host
>                                 'test_lt5_packed',           -- validation table
>                                 1                     -- metrics_compute_frequency
>                               );
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: Input table 'train_lt5_summary' does not exist (plpython.c:5038)
> {code}
> A better message would be:
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: Input table 'train_lt5_summary' does not exist.  Please ensure that the source table you specify has been preprocessed by the image preprocessor. (plpython.c:5038)
> {code}
> (3)
> confusing error message if forgot to preprocess validation table
> {code}
> SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- source table (YES PREPROCESSED)
>                                'mnist_model',         -- model output table
>                                'model_arch_library',  -- model arch table
>                                 1,                    -- model arch id
>                                 $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params
>                                 $$ batch_size=batch_size, epochs=1 $$,  -- fit_params
>                                 5,                    -- num_iterations
>                                 0,                    -- gpus_per_host
>                                 'test_lt5',           -- validation table  (NOT PREPROCESSED)
>                                 1                     -- metrics_compute_frequency
>                               );
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid independent_varname ('independent_var') for table (test_lt5). (plpython.c:5038)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "madlib_keras_fit", line 21, in <module>
>     madlib_keras.fit(**globals())
>   PL/Python function "madlib_keras_fit", line 42, in wrapper
>   PL/Python function "madlib_keras_fit", line 71, in fit
>   PL/Python function "madlib_keras_fit", line 233, in __init__
>   PL/Python function "madlib_keras_fit", line 274, in _validate_input_args
>   PL/Python function "madlib_keras_fit", line 288, in _validate_validation_table
>   PL/Python function "madlib_keras_fit", line 242, in _validate_input_table
>   PL/Python function "madlib_keras_fit", line 96, in _assert
> PL/Python function "madlib_keras_fit"
>  [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- source table\n                               'mnist_model',         -- model output table\n                               'model_arch_library',  -- model arch table\n                                1,                    -- model arch id\n                                $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n                                $$ batch_size=batch_size, epochs=1 $$,  -- fit_params\n                                5,                    -- num_iterations\n                                0,                    -- gpus_per_host\n                                'test_lt5',           -- validation table\n                                1                     -- metrics_compute_frequency\n                              );"]
> {code}
> A better message would be:
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid independent_varname ('independent_var') for table (test_lt5). Please ensure that this table has been preprocessed by the image preprocessor.  (plpython.c:5038)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)