You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2021/01/07 04:34:20 UTC

[GitHub] [madlib] kaknikhil commented on a change in pull request #526: DL: AutoML encapsulation

kaknikhil commented on a change in pull request #526:
URL: https://github.com/apache/madlib/pull/526#discussion_r552291582



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
##########
@@ -79,6 +82,7 @@ class KerasAutoML(object):
         self.model_id_list = sorted(list(set(model_id_list)))
         self.compile_params_grid = compile_params_grid
         self.fit_params_grid = fit_params_grid
+        self.dist_key_col = DISTRIBUTION_KEY_COLNAME

Review comment:
       Why do we need to add the dist_key column for the automl model_output_table ? It might also help to add a description in the commit message about why we need the dist key in the model_output_table. 

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
##########
@@ -172,45 +174,38 @@ class KerasAutoML(object):
         random_state = 'NULL' if self.random_state is None else '$MAD${0}$MAD$'.format(self.random_state)
         validation_table = 'NULL' if self.validation_table is None else '$MAD${0}$MAD$'.format(self.validation_table)
 
-        create_query = plpy.prepare("""
+        create_query = """
                 CREATE TABLE {self.model_summary_table} AS
                 SELECT
                     $MAD${self.source_table}$MAD$::TEXT AS source_table,
                     {validation_table}::TEXT AS validation_table,
                     $MAD${self.model_output_table}$MAD$::TEXT AS model,
                     $MAD${self.model_info_table}$MAD$::TEXT AS model_info,
-                    (SELECT dependent_varname FROM {model_training.model_summary_table})
-                    AS dependent_varname,
-                    (SELECT independent_varname FROM {model_training.model_summary_table})
-                    AS independent_varname,
+                    (SELECT dependent_varname FROM {a.MODEL_SUMMARY_TABLE})

Review comment:
       I am assuming we are creating the automl summary table by querying from the fit multiple summary table. If that's the case, I think it's better to get the fit multiple summary table name from the model_training object since that's the source of truth.
   What do you think ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org