You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2020/12/02 02:13:43 UTC

[GitHub] [madlib] fmcquillan99 commented on pull request #525: DL: Model Hopper Refactor

fmcquillan99 commented on pull request #525:
URL: https://github.com/apache/madlib/pull/525#issuecomment-736941725


   
   re-testing...
   
   (1)
   initial tests for functionality - keras_fit()
   
   After upgrading to TF 1.14 and re-running a test with XLA auto-cluster JIT optimization, I did not see the warning message anymore.  Also, I saw 30% training time improvement from 237 sec/iteration to 166 sec/iteration for the query, which is great.
   
   
   (2)
   initial tests for functionality - keras_fit_multiple_model()
   
   first I started with single segment:
   
   This query does not error on GPUs but still does error out after several minutes.  Also, please turn off the verbose output:
   ```
   NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "__madlib_temp_model_output49543509_1606874577_33655832___pkey" for table "__madlib_temp_model_output49543509_1606874577_33655832__"
   CONTEXT:  SQL statement "
                                       CREATE TABLE __madlib_temp_model_output49543509_1606874577_33655832__
                                       (mst_key INTEGER,
                                        model_weights BYTEA,
                                        model_arch JSON,
                                        compile_params TEXT,
                                        fit_params TEXT,
                                        object_map BYTEA,
                                        __dist_key__ INTEGER,
                                        PRIMARY KEY (__dist_key__, mst_key)
                                       )
                                       DISTRIBUTED BY (__dist_key__)
                                       "
   PL/Python function "madlib_keras_fit_multiple_model"
   NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "cifar10_multi_model_info_pkey" for table "cifar10_multi_model_info"
   CONTEXT:  SQL statement "
               CREATE TABLE cifar10_multi_model_info (
                   mst_key INTEGER PRIMARY KEY,
                   model_id INTEGER,
                   compile_params TEXT,
                   fit_params TEXT,
                   model_type TEXT,
                   model_size DOUBLE PRECISION,
                   metrics_elapsed_time DOUBLE PRECISION[],
                   metrics_type TEXT[],
                   loss_type TEXT,
                   training_metrics_final DOUBLE PRECISION,
                   training_loss_final DOUBLE PRECISION,
                   training_metrics DOUBLE PRECISION[],
                   training_loss DOUBLE PRECISION[],
                   validation_metrics_final DOUBLE PRECISION,
                   validation_loss_final DOUBLE PRECISION,
                   validation_metrics DOUBLE PRECISION[],
                   validation_loss DOUBLE PRECISION[]
              ) "
   PL/Python function "madlib_keras_fit_multiple_model"
   NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'mst_key' as the Greenplum Database data distribution key for this table.
   HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
   CONTEXT:  SQL statement "
                   CREATE TABLE __madlib_temp_next_schedule42211658_1606874876_919189__ AS
                       SELECT
                           mst_key,
                           model_id,
                           __dist_key__ AS __prev_dist_key__,
                           COALESCE(
                               LEAD(__dist_key__)
                                   OVER(ORDER BY __dist_key__),
                               FIRST_VALUE(__dist_key__)
                                   OVER(ORDER BY __dist_key__)
                           ) AS __dist_key__
                       FROM __madlib_temp_schedule9385959_1606874577_2407209__;
               "
   PL/Python function "madlib_keras_fit_multiple_model"
   ERROR:  plpy.SPIError: AttributeError: '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string' (plpython.c:5038)  (seg0 slice1 10.128.0.41:40000 pid=9278) (plpython.c:5038)
   DETAIL:  Message skipped due to incorrect encoding.
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 24, in <module>
       fit_obj.fit_multiple_model()
     PL/Python function "madlib_keras_fit_multiple_model", line 247, in fit_multiple_model
     PL/Python function "madlib_keras_fit_multiple_model", line 288, in train_multiple_model
     PL/Python function "madlib_keras_fit_multiple_model", line 929, in run_training
   PL/Python function "madlib_keras_fit_multiple_model"
   ```
   
   (3)
   initial tests for functionality - keras_fit_multiple_model()
   
   next I used 2 segments:
   
   This query errors out after several minutes.
   
   
   ```
   NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "__madlib_temp_model_output49543509_1606875004_33656259___pkey" for table "__madlib_temp_model_output49543509_1606875004_33656259__"
   CONTEXT:  SQL statement "
                                       CREATE TABLE __madlib_temp_model_output49543509_1606875004_33656259__
                                       (mst_key INTEGER,
                                        model_weights BYTEA,
                                        model_arch JSON,
                                        compile_params TEXT,
                                        fit_params TEXT,
                                        object_map BYTEA,
                                        __dist_key__ INTEGER,
                                        PRIMARY KEY (__dist_key__, mst_key)
                                       )
                                       DISTRIBUTED BY (__dist_key__)
                                       "
   PL/Python function "madlib_keras_fit_multiple_model"
   NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "cifar10_multi_model_info_pkey" for table "cifar10_multi_model_info"
   CONTEXT:  SQL statement "
               CREATE TABLE cifar10_multi_model_info (
                   mst_key INTEGER PRIMARY KEY,
                   model_id INTEGER,
                   compile_params TEXT,
                   fit_params TEXT,
                   model_type TEXT,
                   model_size DOUBLE PRECISION,
                   metrics_elapsed_time DOUBLE PRECISION[],
                   metrics_type TEXT[],
                   loss_type TEXT,
                   training_metrics_final DOUBLE PRECISION,
                   training_loss_final DOUBLE PRECISION,
                   training_metrics DOUBLE PRECISION[],
                   training_loss DOUBLE PRECISION[],
                   validation_metrics_final DOUBLE PRECISION,
                   validation_loss_final DOUBLE PRECISION,
                   validation_metrics DOUBLE PRECISION[],
                   validation_loss DOUBLE PRECISION[]
              ) "
   PL/Python function "madlib_keras_fit_multiple_model"
   ERROR:  plpy.SPIError: AttributeError: '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string' (plpython.c:5038)  (seg0 slice1 10.128.0.41:40000 pid=9489) (plpython.c:5038)
   DETAIL:  :
   Traceback (most recent call last):
     File "<string>", line 15, in __plpython_procedure_fit_transition_multiple_model_1572293
     File "/home/gpadmin/madlib/build/src/ports/greenplum/5/modules/deep_learning/madlib_keras.py", line 545, in fit_transition
       custom_function_map)
     File "/home/gpadmin/madlib/build/src/ports/greenplum/5/modules/deep_learning/madlib_keras.py", line 90, in get_init_model_and_sess
       segment_model = init_model(model_architecture, compile_params, custom_function_map)
     File "/home/gpadmin/madlib/build/src/ports/greenplum/5/modules/deep_learning/madlib_keras.py", line 497, in init_model
       segment_model = model_from_json(model_architecture)
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/engine/saving.py", line 492, in model_from_json
       return deserialize(config, custom_objects=custom_objects)
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/layers/__init__.py", line 55, in deserialize
       printable_module_name='layer')
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/utils/generic_utils.py", line 145, in deserialize_keras_object
       list(custom_objects.items())))
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/engine/sequential.py", line 301, in from_config
       model.add(layer)
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/engine/sequential.py", line 181, in add
       output_tensor = layer(self.outputs[0])
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/engine/base_layer.py", line 457, in __call__
       output = self.call(inputs, **kwargs)
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/layers/normalization.py", line 185, in call
       epsilon=self.epsilon)
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 1858, in normalize_batch_in_training
       if not _has_nchw_support() and list(reduction_axes) == [0, 2, 3]:
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 291, in _has_nchw_support
       explicitly_on_cpu = _is_current_explicit_device('CPU')
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 266, in _is_current_explicit_device
       device = _get_current_tf_device()
     File "/home/gpadmin/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 247, in _get_current_tf_device
       g._apply_device_functions(op)
     File "/usr/local/greenplum-db/ext/python/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 4581, in _apply_device_functions
       op._set_device_from_string(device_string)
   AttributeError: '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string'
   Traceback (most recent call last):
     PL/Python function "fit_transition_multiple_model", line 20, in <module>
       raise e
   PL/Python function "fit_transition_multiple_model"
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 24, in <module>
       fit_obj.fit_multiple_model()
     PL/Python function "madlib_keras_fit_multiple_model", line 247, in fit_multiple_model
     PL/Python function "madlib_keras_fit_multiple_model", line 288, in train_multiple_model
     PL/Python function "madlib_keras_fit_multiple_model", line 929, in run_training
   PL/Python function "madlib_keras_fit_multiple_model"
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org