You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2020/11/20 18:57:37 UTC

[GitHub] [madlib] fmcquillan99 commented on pull request #525: DL: Model Hopper Refactor

fmcquillan99 commented on pull request #525:
URL: https://github.com/apache/madlib/pull/525#issuecomment-731350733


   (1)
   initial tests for functionality - keras_fit()
   
   ```
   DROP TABLE IF EXISTS cifar_10_model, cifar_10_model_summary;
   SELECT madlib.madlib_keras_fit('cifar_10_train_data_packed_allseg',    -- source table
                                  'cifar_10_model',                -- model output table
                                  'model_arch_library',            -- model arch table
                                   1,                              -- model arch id
                                   $$ loss='categorical_crossentropy', optimizer='rmsprop(lr=0.0001, decay=1e-6)', metrics=['accuracy']$$,  -- compile_params
                                   $$ batch_size=32, epochs=3 $$,  -- fit_params
                                   3,                              -- num_iterations
                                   NULL,                          -- use GPUs
                                   'cifar_10_test_data_packed_allseg',    -- validation dataset 
                                   1                               -- metrics compute frequency 
                                 ); 
   ```
   produces warning:
   
   ```
   WARNING:  This version of tensorflow does not support XLA auto-cluster JIT optimization.  HINT:  upgrading tensorflow may improve performance.  (seg0 slice1 10.128.0.41:40000 pid=6270)
   CONTEXT:  PL/Python function "fit_transition"
   WARNING:  This version of tensorflow does not support XLA auto-cluster JIT optimization.  HINT:  upgrading tensorflow may improve performance.  (seg1 slice1 10.128.0.41:40001 pid=6271)
   CONTEXT:  PL/Python function "fit_transition"
   ```
   
   What does user need to do to enable XLA?  I am on TF 1.13.1 currently.
   
   Otherwise this ran and also warm start seemed to work.
   
   
   (2)
   initial tests for functionality - keras_fit_multiple_model()
   
   first I started with single segment:
   ```
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_train_data_packed ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               1 | {16667,32,32,3}       | {16667,10}          |         0
               1 | {16666,32,32,3}       | {16666,10}          |         2
               1 | {16667,32,32,3}       | {16667,10}          |         1
   (3 rows)
   
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_test_data_packed ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               1 | {10000,32,32,3}       | {10000,10}          |         0
   (1 row)
   ```
   
   run multi fit:
   ```
   DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary, cifar10_multi_model_info;
   SELECT madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed',    -- source_table
                                                 'cifar10_multi_model',     -- model_output_table
                                                 'mst_table',               -- model_selection_table
                                                  3,                       -- num_iterations
                                                  NULL,                     -- use gpus
                                                 'cifar_10_test_data_packed',      -- validation dataset
                                                  1,                         -- metrics compute frequency
                                                  NULL,                      -- warm_start
                                                  'me',
                                                  'this is a test run'
                                                );
   ```
   
   produces error:
   ```
   ERROR:  plpy.Error: madlib_keras_fit_multiple_model error: No GPUs configured on hosts. (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 23, in <module>
       fit_obj = madlib_keras_fit_multiple_model.FitMultipleModel(**globals())
     PL/Python function "madlib_keras_fit_multiple_model", line 147, in __init__
     PL/Python function "madlib_keras_fit_multiple_model", line 295, in get_accessible_gpus_for_seg
   PL/Python function "madlib_keras_fit_multiple_model"
   
   ```
   
   so it looks like `use gpus=NULL` is now defaulting to `TRUE` but it should default to `FALSE` i.e., CPUs.  It used to default to CPUs.
   
   
   (3)
   initial tests for functionality - keras_fit_multiple_model()
   
   next I used 2 segments:
   ```
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_train_data_packed_allseg ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               0 | {12500,32,32,3}       | {12500,10}          |         3
               0 | {12500,32,32,3}       | {12500,10}          |         1
               1 | {12500,32,32,3}       | {12500,10}          |         2
               1 | {12500,32,32,3}       | {12500,10}          |         0
   (4 rows)
   
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id FROM cifar_10_test_data_packed_allseg ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               0 | {5000,32,32,3}        | {5000,10}           |         1
               1 | {5000,32,32,3}        | {5000,10}           |         0
   (2 rows)
   ```
   
   run multi fit:
   ```
   DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary, cifar10_multi_model_info;
   SELECT madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed_allseg',    -- source_table
                                                 'cifar10_multi_model',     -- model_output_table
                                                 'mst_table',               -- model_selection_table
                                                  3,                       -- num_iterations
                                                  NULL,                     -- use gpus
                                                 'cifar_10_test_data_packed_allseg',      -- validation dataset
                                                  1,                         -- metrics compute frequency
                                                  NULL,                      -- warm_start
                                                  'me',
                                                  'this is a test run'
                                                );
   ```
   
   which produced error:
   ```
   ERROR:  plpy.SPIError: PRIMARY KEY and DISTRIBUTED BY definitions incompatible
   HINT:  When there is both a PRIMARY KEY, and a DISTRIBUTED BY clause, the DISTRIBUTED BY clause must be equal to or a left-subset of the PRIMARY KEY
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 24, in <module>
       fit_obj.fit_multiple_model()
     PL/Python function "madlib_keras_fit_multiple_model", line 241, in fit_multiple_model
     PL/Python function "madlib_keras_fit_multiple_model", line 509, in init_model_output_tbl
   PL/Python function "madlib_keras_fit_multiple_model"
   ```
   
   I have also seen this error when running on a single segment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org