You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@madlib.apache.org by kh...@apache.org on 2020/10/27 20:18:05 UTC

[madlib] branch master updated (0729a6f -> 33c1a6d)

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git.


    from 0729a6f  DL: Restrict access to the custom functions table
     new 6fd6d65  DL: Update test sql to reference to tablename created in the same file
     new f2b80f2  DL: [AutoML] Hyperopt implementation
     new 88f2ebc  Add MinWarning to remove extraneous INFO messages
     new 2d6e599  DL: [AutoML] Add new class for Distribution rules
     new c849dd0  DL: [AutoML] Split automl methods to their own files
     new 30db0e6  Split`with` for multiple expressions into nested calls
     new 6976c9f  user docs and examples for automl
     new 33c1a6d  additional user docs updates about installing Dill and Hyperopt

The 8 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/ports/postgres/madpack/SQLCommon.m4_in         |   2 +-
 .../deep_learning/input_data_preprocessor.py_in    |  18 +-
 .../deep_learning/madlib_keras_automl.py_in        | 523 +++---------
 .../deep_learning/madlib_keras_automl.sql_in       | 936 ++++++++++++++++++---
 ...l.py_in => madlib_keras_automl_hyperband.py_in} | 395 +++------
 .../madlib_keras_automl_hyperopt.py_in             | 458 ++++++++++
 .../madlib_keras_custom_function.sql_in            |   6 +-
 .../madlib_keras_fit_multiple_model.py_in          |  24 +-
 .../deep_learning/madlib_keras_gpu_info.sql_in     |   6 +-
 .../deep_learning/madlib_keras_helper.py_in        |  56 +-
 .../madlib_keras_model_selection.py_in             |  63 +-
 .../deep_learning/madlib_keras_validator.py_in     |   3 +-
 .../deep_learning/test/madlib_keras_automl.sql_in  | 459 ++++++----
 .../test/madlib_keras_custom_function.sql_in       |   8 +-
 .../test/unit_tests/test_madlib_keras.py_in        |  24 +-
 .../test/unit_tests/test_madlib_keras_automl.py_in |  84 +-
 .../test_madlib_keras_model_selection_table.py_in  |   9 +-
 src/ports/postgres/modules/lda/lda.sql_in          |  27 +-
 .../modules/utilities/text_utilities.sql_in        |   5 +-
 .../postgres/modules/utilities/utilities.py_in     |   7 +
 20 files changed, 2075 insertions(+), 1038 deletions(-)
 copy src/ports/postgres/modules/deep_learning/{madlib_keras_automl.py_in => madlib_keras_automl_hyperband.py_in} (51%)
 create mode 100644 src/ports/postgres/modules/deep_learning/madlib_keras_automl_hyperopt.py_in

[madlib] 02/08: DL: [AutoML] Hyperopt implementation

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit f2b80f211cf5462782a9238dfac9f41d37332c6a
Author: Advitya Gemawat <ag...@vmware.com>
AuthorDate: Sun Sep 20 21:58:36 2020 -0700

    DL: [AutoML] Hyperopt implementation
    
    JIRA: MADLIB-1453
    
    We expand AutoML capabilities in MADlib by adding support for Hyperopt,
    another hyperparameter optimization method to minimize losses with
    awkward search spaces. The user can declaratively specify the automl
    method name with the related automl params (with all other argument
    declarations staying the same across different AutoML methods) and our
    API handles the exploration and execution components with the algorithm
    workload info displayed to the user.
    
    We implement Hyperopt for Massively Parallel Processing (MPP) to perform
    model selection on top of our existing Model hOpper (MOP) infrastructure
    for model training/evaluation. Our API currently offers declarative
    support for Random Search and Tree-structured Parzen Estimator (TPE), a
    Bayesian Optimization like approach.
    
    Additionally,
    - The user will have to install the hyperopt python module on the master
    host when calling hyperopt automl_method.
    - Print 'best-so-far' console info for each AutoML method (i.e.
    Hyperband and Hyperopt)
    - Disable automl method argument prefixing (while still retaining case
    insensitivity).
    - If no automl_method is passed in, the default is `hyperband`.
    - Default values for automl_params:
      Hyperband: `R=6, eta=3, skip_last=0`
      Hyperopt:  `num_configs=20, num_iterations=5, algorithm=tpe`
    - For hyperopt, the metrics_elapsed_time in the output info table is
    accumulated across each trial.
    
    Co-authored-by: Nikhil Kak <nk...@vmware.com>
    Co-authored-by: Ekta Khanna <ek...@vmware.com>
---
 .../deep_learning/input_data_preprocessor.py_in    |   3 +-
 .../deep_learning/madlib_keras_automl.py_in        | 839 +++++++++++++++++----
 .../deep_learning/madlib_keras_automl.sql_in       |  29 +-
 .../madlib_keras_fit_multiple_model.py_in          |  24 +-
 .../deep_learning/madlib_keras_helper.py_in        |  57 +-
 .../madlib_keras_model_selection.py_in             |  63 +-
 .../deep_learning/madlib_keras_validator.py_in     |   2 +-
 .../deep_learning/test/madlib_keras_automl.sql_in  | 459 +++++++----
 .../test/unit_tests/test_madlib_keras_automl.py_in |  74 ++
 .../test_madlib_keras_model_selection_table.py_in  |   9 +-
 .../postgres/modules/utilities/utilities.py_in     |   7 +
 11 files changed, 1181 insertions(+), 385 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
index 605d439..1d395a6 100644
--- a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
+++ b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
@@ -610,7 +610,7 @@ class InputDataPreprocessorDL(object):
                 {self.buffer_size} AS buffer_size,
                 {self.normalizing_const}::{FLOAT32_SQL_TYPE} AS {normalizing_const_colname},
                 {self.num_classes} AS {num_classes_colname},
-                {self.distribution_rules} AS distribution_rules,
+                {self.distribution_rules} AS {distribution_rules},
                 {self.gpu_config} AS {internal_gpu_config}
             """.format(self=self, class_level_str=class_level_str,
                        dependent_varname_colname=DEPENDENT_VARNAME_COLNAME,
@@ -620,6 +620,7 @@ class InputDataPreprocessorDL(object):
                        normalizing_const_colname=NORMALIZING_CONST_COLNAME,
                        num_classes_colname=NUM_CLASSES_COLNAME,
                        internal_gpu_config=INTERNAL_GPU_CONFIG,
+                       distribution_rules=DISTRIBUTION_RULES,
                        FLOAT32_SQL_TYPE=FLOAT32_SQL_TYPE)
         plpy.execute(query)
 
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
index 0f71fdf..d6eeba3 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
@@ -17,28 +17,35 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from ast import literal_eval
 from datetime import datetime
-import plpy
+from hyperopt import hp, rand, tpe, atpe, Trials, STATUS_OK, STATUS_RUNNING
+from hyperopt.base import Domain
 import math
-from time import time
+import numpy as np
+import plpy
+import time
 
 from madlib_keras_validator import MstLoaderInputValidator
-from utilities.utilities import unique_string, add_postfix, extract_keyvalue_params, \
-    _assert, _assert_equal, rename_table
+# from utilities.admin import cleanup_madlib_temp_tables
+from utilities.utilities import get_current_timestamp, get_seg_number, get_segments_per_host, \
+    unique_string, add_postfix, extract_keyvalue_params, _assert, _assert_equal, rename_table
 from utilities.control import MinWarning, SetGUC
 from madlib_keras_fit_multiple_model import FitMultipleModel
+from madlib_keras_helper import generate_row_string
+from madlib_keras_helper import DISTRIBUTION_RULES
 from madlib_keras_model_selection import MstSearch, ModelSelectionSchema
 from keras_model_arch_table import ModelArchSchema
-from utilities.validate_args import table_exists, drop_tables
+from utilities.validate_args import table_exists, drop_tables, input_tbl_valid
 from utilities.validate_args import quote_ident
 
-
-class AutoMLSchema:
+class AutoMLConstants:
     BRACKET = 's'
     ROUND = 'i'
     CONFIGURATIONS = 'n_i'
     RESOURCES = 'r_i'
     HYPERBAND = 'hyperband'
+    HYPEROPT = 'hyperopt'
     R = 'R'
     ETA = 'eta'
     SKIP_LAST = 'skip_last'
@@ -47,7 +54,12 @@ class AutoMLSchema:
     TEMP_MST_SUMMARY_TABLE = add_postfix(TEMP_MST_TABLE, '_summary')
     TEMP_OUTPUT_TABLE = unique_string('temp_output_table')
     METRICS_ITERS = 'metrics_iters' # custom column
-
+    NUM_CONFIGS = 'num_configs'
+    NUM_ITERS = 'num_iterations'
+    ALGORITHM = 'algorithm'
+    TIME_FORMAT = '%Y-%m-%d %H:%M:%S'
+    INT_MAX = 2 ** 31 - 1
+    TARGET_SCHEMA = 'public'
 
 @MinWarning("warning")
 class HyperbandSchedule():
@@ -66,6 +78,7 @@ class HyperbandSchedule():
         self.R = R # maximum iterations/epochs allocated to a configuration
         self.eta = eta # defines downsampling rate
         self.skip_last = skip_last
+        self.module_name = 'hyperband_schedule'
         self.validate_inputs()
 
         # number of unique executions of Successive Halving (minus one)
@@ -87,12 +100,12 @@ class HyperbandSchedule():
         """
         Validates user input values
         """
-        _assert(self.eta > 1, "DL: eta must be greater than 1")
-        _assert(self.R >= self.eta, "DL: R should not be less than eta")
+        _assert(self.eta > 1, "{0}: eta must be greater than 1".format(self.module_name))
+        _assert(self.R >= self.eta, "{0}: R should not be less than eta".format(self.module_name))
 
     def validate_s_max(self):
-        _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "DL: skip_last must be " +
-                "non-negative and less than {0}".format(self.s_max))
+        _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "{0}: skip_last must be " +
+                "non-negative and less than {1}".format(self.module_name,self.s_max))
 
     def calculate_schedule(self):
         """
@@ -108,10 +121,10 @@ class HyperbandSchedule():
                 n_i = n*math.pow(self.eta, -i)
                 r_i = r*math.pow(self.eta, i)
 
-                self.schedule_vals.append({AutoMLSchema.BRACKET: s,
-                                           AutoMLSchema.ROUND: i,
-                                           AutoMLSchema.CONFIGURATIONS: int(n_i),
-                                           AutoMLSchema.RESOURCES: int(round(r_i))})
+                self.schedule_vals.append({AutoMLConstants.BRACKET: s,
+                                           AutoMLConstants.ROUND: i,
+                                           AutoMLConstants.CONFIGURATIONS: int(n_i),
+                                           AutoMLConstants.RESOURCES: int(round(r_i))})
 
     def create_schedule_table(self):
         """Initializes the output schedule table"""
@@ -124,20 +137,19 @@ class HyperbandSchedule():
                             unique ({s}, {i})
                         );
                        """.format(self=self,
-                                  s=AutoMLSchema.BRACKET,
-                                  i=AutoMLSchema.ROUND,
-                                  n_i=AutoMLSchema.CONFIGURATIONS,
-                                  r_i=AutoMLSchema.RESOURCES)
-        with MinWarning('warning'):
-            plpy.execute(create_query)
+                                  s=AutoMLConstants.BRACKET,
+                                  i=AutoMLConstants.ROUND,
+                                  n_i=AutoMLConstants.CONFIGURATIONS,
+                                  r_i=AutoMLConstants.RESOURCES)
+        plpy.execute(create_query)
 
     def insert_into_schedule_table(self):
         """Insert everything in self.schedule_vals into the output schedule table."""
         for sd in self.schedule_vals:
-            sd_s = sd[AutoMLSchema.BRACKET]
-            sd_i = sd[AutoMLSchema.ROUND]
-            sd_n_i = sd[AutoMLSchema.CONFIGURATIONS]
-            sd_r_i = sd[AutoMLSchema.RESOURCES]
+            sd_s = sd[AutoMLConstants.BRACKET]
+            sd_i = sd[AutoMLConstants.ROUND]
+            sd_n_i = sd[AutoMLConstants.CONFIGURATIONS]
+            sd_r_i = sd[AutoMLConstants.RESOURCES]
             insert_query = """
                             INSERT INTO
                                 {self.schedule_table}(
@@ -152,27 +164,28 @@ class HyperbandSchedule():
                                 {sd_n_i},
                                 {sd_r_i}
                             )
-                           """.format(s_col=AutoMLSchema.BRACKET,
-                                      i_col=AutoMLSchema.ROUND,
-                                      n_i_col=AutoMLSchema.CONFIGURATIONS,
-                                      r_i_col=AutoMLSchema.RESOURCES,
+                           """.format(s_col=AutoMLConstants.BRACKET,
+                                      i_col=AutoMLConstants.ROUND,
+                                      n_i_col=AutoMLConstants.CONFIGURATIONS,
+                                      r_i_col=AutoMLConstants.RESOURCES,
                                       **locals())
             plpy.execute(insert_query)
 
-@MinWarning("warning")
-class KerasAutoML():
-    """The core AutoML function for running AutoML algorithms such as Hyperband.
-    This function executes the hyperband rounds 'diagonally' to evaluate multiple configurations together
-    and leverage the compute power of MPP databases such as Greenplum.
+# @MinWarning("warning")
+class KerasAutoML(object):
+    """
+    The core AutoML class for running AutoML algorithms such as Hyperband and Hyperopt.
     """
     def __init__(self, schema_madlib, source_table, model_output_table, model_arch_table, model_selection_table,
                  model_id_list, compile_params_grid, fit_params_grid, automl_method='hyperband',
-                 automl_params='R=6, eta=3, skip_last=0', random_state=None, object_table=None,
+                 automl_params=None, random_state=None, object_table=None,
                  use_gpus=False, validation_table=None, metrics_compute_frequency=None,
                  name=None, description=None, **kwargs):
         self.schema_madlib = schema_madlib
         self.source_table = source_table
         self.model_output_table = model_output_table
+        self.module_name = 'madlib_keras_automl'
+        input_tbl_valid(self.source_table, self.module_name)
         if self.model_output_table:
             self.model_info_table = add_postfix(self.model_output_table, '_info')
             self.model_summary_table = add_postfix(self.model_output_table, '_summary')
@@ -199,10 +212,9 @@ class KerasAutoML():
             module_name='madlib_keras_automl'
         )
 
-        self.automl_method = automl_method if automl_method else 'hyperband'
-        self.automl_params = automl_params if automl_params else 'R=6, eta=3, skip_last=0'
+        self.automl_method = automl_method
+        self.automl_params = automl_params
         self.random_state = random_state
-        self.validate_and_define_inputs()
 
         self.object_table = object_table
         self.use_gpus = use_gpus if use_gpus else False
@@ -212,13 +224,7 @@ class KerasAutoML():
         self.description = description
 
         if self.validation_table:
-            AutoMLSchema.LOSS_METRIC = 'validation_loss_final'
-
-        self.create_model_output_table()
-        self.create_model_output_info_table()
-
-        if AutoMLSchema.HYPERBAND.startswith(self.automl_method.lower()):
-            self.find_hyperband_config()
+            AutoMLConstants.LOSS_METRIC = 'validation_loss_final'
 
     def create_model_output_table(self):
         output_table_create_query = """
@@ -228,8 +234,8 @@ class KerasAutoML():
                                      {ModelArchSchema.MODEL_ARCH} JSON)
                                     """.format(self=self, ModelSelectionSchema=ModelSelectionSchema,
                                                ModelArchSchema=ModelArchSchema)
-        with MinWarning('warning'):
-            plpy.execute(output_table_create_query)
+        # with MinWarning('warning'):
+        plpy.execute(output_table_create_query)
 
     def create_model_output_info_table(self):
         info_table_create_query = """
@@ -252,38 +258,87 @@ class KerasAutoML():
                                    validation_metrics DOUBLE PRECISION[],
                                    validation_loss DOUBLE PRECISION[],
                                    {AutoMLSchema.METRICS_ITERS} INTEGER[])
-                                       """.format(self=self, ModelSelectionSchema=ModelSelectionSchema,
-                                                  ModelArchSchema=ModelArchSchema, AutoMLSchema=AutoMLSchema)
-        with MinWarning('warning'):
-            plpy.execute(info_table_create_query)
+                                   """.format(self=self,
+                                              ModelSelectionSchema=ModelSelectionSchema,
+                                              ModelArchSchema=ModelArchSchema,
+                                              AutoMLSchema=AutoMLConstants)
+        plpy.execute(info_table_create_query)
 
-    def validate_and_define_inputs(self):
+    def update_model_selection_table(self):
+        """
+        Drops and re-create the mst table to only include the best performing model configuration.
+        """
+        drop_tables([self.model_selection_table])
 
-        if AutoMLSchema.HYPERBAND.startswith(self.automl_method.lower()):
-            automl_params_dict = extract_keyvalue_params(self.automl_params,
-                                                         default_values={'R': 6, 'eta': 3, 'skip_last': 0},
-                                                         lower_case_names=False)
-            # casting dict values to int
-            for i in automl_params_dict:
-                automl_params_dict[i] = int(automl_params_dict[i])
-            _assert(len(automl_params_dict) >= 1 or len(automl_params_dict) <= 3,
-                    "DL: Only R, eta, and skip_last may be specified")
-            for i in automl_params_dict:
-                if i == AutoMLSchema.R:
-                    self.R = automl_params_dict[AutoMLSchema.R]
-                elif i == AutoMLSchema.ETA:
-                    self.eta = automl_params_dict[AutoMLSchema.ETA]
-                elif i == AutoMLSchema.SKIP_LAST:
-                    self.skip_last = automl_params_dict[AutoMLSchema.SKIP_LAST]
-                else:
-                    plpy.error("DL: {0} is an invalid param".format(i))
-            _assert(self.eta > 1, "DL: eta must be greater than 1")
-            _assert(self.R >= self.eta, "DL: R should not be less than eta")
-            self.s_max = int(math.floor(math.log(self.R, self.eta)))
-            _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "DL: skip_last must be " +
-                    "non-negative and less than {0}".format(self.s_max))
-        else:
-            plpy.error("DL: Only hyperband is currently supported as the automl method")
+        # only retaining best performing config
+        plpy.execute("CREATE TABLE {self.model_selection_table} AS SELECT {ModelSelectionSchema.MST_KEY}, " \
+                     "{ModelSelectionSchema.MODEL_ID}, {ModelSelectionSchema.COMPILE_PARAMS}, " \
+                     "{ModelSelectionSchema.FIT_PARAMS} FROM {self.model_info_table} " \
+                     "ORDER BY {AutoMLSchema.LOSS_METRIC} LIMIT 1".format(self=self,
+                                                                          AutoMLSchema=AutoMLConstants,
+                                                                          ModelSelectionSchema=ModelSelectionSchema))
+
+    def generate_model_output_summary_table(self, model_training):
+        """
+        Creates and populates static values related to the AutoML workload.
+        :param model_training: Fit Multiple function call object.
+        """
+        #TODO this code is duplicated in create_model_summary_table
+        name = 'NULL' if self.name is None else '$MAD${0}$MAD$'.format(self.name)
+        descr = 'NULL' if self.description is None else '$MAD${0}$MAD$'.format(self.description)
+        object_table = 'NULL' if self.object_table is None else '$MAD${0}$MAD$'.format(self.object_table)
+        random_state = 'NULL' if self.random_state is None else '$MAD${0}$MAD$'.format(self.random_state)
+        validation_table = 'NULL' if self.validation_table is None else '$MAD${0}$MAD$'.format(self.validation_table)
+
+        create_query = plpy.prepare("""
+                CREATE TABLE {self.model_summary_table} AS
+                SELECT
+                    $MAD${self.source_table}$MAD$::TEXT AS source_table,
+                    {validation_table}::TEXT AS validation_table,
+                    $MAD${self.model_output_table}$MAD$::TEXT AS model,
+                    $MAD${self.model_info_table}$MAD$::TEXT AS model_info,
+                    (SELECT dependent_varname FROM {model_training.model_summary_table})
+                    AS dependent_varname,
+                    (SELECT independent_varname FROM {model_training.model_summary_table})
+                    AS independent_varname,
+                    $MAD${self.model_arch_table}$MAD$::TEXT AS model_arch_table,
+                    $MAD${self.model_selection_table}$MAD$::TEXT AS model_selection_table,
+                    $MAD${self.automl_method}$MAD$::TEXT AS automl_method,
+                    $MAD${self.automl_params}$MAD$::TEXT AS automl_params,
+                    {random_state}::TEXT AS random_state,
+                    {object_table}::TEXT AS object_table,
+                    {self.use_gpus} AS use_gpus,
+                    (SELECT metrics_compute_frequency FROM {model_training.model_summary_table})::INTEGER
+                    AS metrics_compute_frequency,
+                    {name}::TEXT AS name,
+                    {descr}::TEXT AS description,
+                    '{self.start_training_time}'::TIMESTAMP AS start_training_time,
+                    '{self.end_training_time}'::TIMESTAMP AS end_training_time,
+                    (SELECT madlib_version FROM {model_training.model_summary_table}) AS madlib_version,
+                    (SELECT num_classes FROM {model_training.model_summary_table})::INTEGER AS num_classes,
+                    (SELECT class_values FROM {model_training.model_summary_table}) AS class_values,
+                    (SELECT dependent_vartype FROM {model_training.model_summary_table})
+                    AS dependent_vartype,
+                    (SELECT normalizing_const FROM {model_training.model_summary_table})
+                    AS normalizing_const
+            """.format(self=self,
+                       validation_table=validation_table,
+                       random_state=random_state,
+                       object_table=object_table,
+                       name=name,
+                       descr=descr,
+                       model_training=model_training))
+
+        # with MinWarning('warning'):
+        plpy.execute(create_query)
+
+    def is_automl_method(self, method_name):
+        """
+        Utility function to check automl method name.
+        :param method_name: name of chosen method name to check.
+        :return: boolean
+        """
+        return self.automl_method.lower() == method_name.lower()
 
     def _is_valid_metrics_compute_frequency(self, num_iterations):
         """
@@ -296,9 +351,97 @@ class KerasAutoML():
                (self.metrics_compute_frequency >= 1 and \
                 self.metrics_compute_frequency <= num_iterations)
 
+    def print_best_mst_so_far(self):
+        """
+        Prints mst keys with best train/val losses at a given point.
+        """
+        best_so_far = '\n'
+        best_so_far += self.print_best_helper('training')
+        if self.validation_table:
+            best_so_far += self.print_best_helper('validation')
+        plpy.info(best_so_far)
+
+    def print_best_helper(self, keyword):
+        """
+        Helper function to Prints mst keys with best train/val losses at a given point.
+        :param keyword: column prefix ('training' or 'validation')
+        :return:
+        """
+        metrics_word, loss_word = keyword + '_metrics_final', keyword + '_loss_final'
+
+        res_str = 'Best {keyword} loss so far:\n'.format(keyword=keyword)
+        best_value = plpy.execute("SELECT {ModelSelectionSchema.MST_KEY}, {metrics_word}, " \
+                                  "{loss_word} FROM {self.model_info_table} ORDER BY " \
+                                  "{loss_word} LIMIT 1".format(self=self, ModelSelectionSchema=ModelSelectionSchema,
+                                                               metrics_word=metrics_word, loss_word=loss_word))[0]
+        mst_key_value, metric_value, loss_value = best_value[ModelSelectionSchema.MST_KEY], \
+                                                  best_value[metrics_word], best_value[loss_word]
+        res_str += ModelSelectionSchema.MST_KEY + '=' + str(mst_key_value) + ': metric=' + str(metric_value) + \
+                   ', loss=' + str(loss_value) + '\n'
+        return res_str
+
+    def remove_temp_tables(self, model_training):
+        """
+        Remove all intermediate tables created for AutoML runs/updates.
+        :param model_training: Fit Multiple function call object.
+        """
+        drop_tables([model_training.original_model_output_table, model_training.model_info_table,
+                     model_training.model_summary_table, AutoMLConstants.TEMP_MST_TABLE,
+                     AutoMLConstants.TEMP_MST_SUMMARY_TABLE])
+
+# @MinWarning("warning")
+class AutoMLHyperband(KerasAutoML):
+    """
+    This class implements Hyperband, an infinite-arm bandit based algorithm that speeds up random search
+    through adaptive resource allocation, successive halving (SHA), and early stopping.
+
+    This class showcases a novel hyperband implementation by executing the hyperband rounds 'diagonally'
+    to evaluate multiple configurations together and leverage the compute power of MPP databases such as Greenplum.
+
+    This automl method inherits qualities from the automl class.
+    """
+    def __init__(self, schema_madlib, source_table, model_output_table, model_arch_table, model_selection_table,
+                 model_id_list, compile_params_grid, fit_params_grid, automl_method,
+                 automl_params, random_state=None, object_table=None,
+                 use_gpus=False, validation_table=None, metrics_compute_frequency=None,
+                 name=None, description=None, **kwargs):
+        automl_method = automl_method if automl_method else AutoMLConstants.HYPERBAND
+        automl_params = automl_params if automl_params else 'R=6, eta=3, skip_last=0'
+        KerasAutoML.__init__(self, schema_madlib, source_table, model_output_table, model_arch_table,
+                             model_selection_table, model_id_list, compile_params_grid, fit_params_grid,
+                             automl_method, automl_params, random_state, object_table, use_gpus,
+                             validation_table, metrics_compute_frequency, name, description, **kwargs)
+        self.validate_and_define_inputs()
+        self.create_model_output_table()
+        self.create_model_output_info_table()
+        self.find_hyperband_config()
+
+    def validate_and_define_inputs(self):
+        automl_params_dict = extract_keyvalue_params(self.automl_params,
+                                                     lower_case_names=False)
+        # casting dict values to int
+        for i in automl_params_dict:
+            automl_params_dict[i] = int(automl_params_dict[i])
+        _assert(len(automl_params_dict) >= 1 and len(automl_params_dict) <= 3,
+                "{0}: Only R, eta, and skip_last may be specified".format(self.module_name))
+        for i in automl_params_dict:
+            if i == AutoMLConstants.R:
+                self.R = automl_params_dict[AutoMLConstants.R]
+            elif i == AutoMLConstants.ETA:
+                self.eta = automl_params_dict[AutoMLConstants.ETA]
+            elif i == AutoMLConstants.SKIP_LAST:
+                self.skip_last = automl_params_dict[AutoMLConstants.SKIP_LAST]
+            else:
+                plpy.error("{0}: {1} is an invalid automl param".format(self.module_name, i))
+        _assert(self.eta > 1, "{0}: eta must be greater than 1".format(self.module_name))
+        _assert(self.R >= self.eta, "{0}: R should not be less than eta".format(self.module_name))
+        self.s_max = int(math.floor(math.log(self.R, self.eta)))
+        _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "{0}: skip_last must be " \
+                "non-negative and less than {1}".format(self.module_name, self.s_max))
+
     def find_hyperband_config(self):
         """
-        Runs the diagonal hyperband algorithm.
+        Executes the diagonal hyperband algorithm.
         """
         initial_vals = {}
 
@@ -308,6 +451,7 @@ class KerasAutoML():
             r = self.R * math.pow(self.eta, -s) # initial number of iterations to run configurations for
             initial_vals[s] = (n, int(round(r)))
         self.start_training_time = self.get_current_timestamp()
+        self.start_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
         random_search = MstSearch(self.schema_madlib,
                                   self.model_arch_table,
                                   self.model_selection_table,
@@ -322,7 +466,7 @@ class KerasAutoML():
 
         # for creating the summary table for usage in fit multiple
         plpy.execute("CREATE TABLE {AutoMLSchema.TEMP_MST_SUMMARY_TABLE} AS " \
-                     "SELECT * FROM {random_search.model_selection_summary_table}".format(AutoMLSchema=AutoMLSchema,
+                     "SELECT * FROM {random_search.model_selection_summary_table}".format(AutoMLSchema=AutoMLConstants,
                                                                                           random_search=random_search))
         ranges_dict = self.mst_key_ranges_dict(initial_vals)
         # to store the bracket and round numbers
@@ -347,10 +491,11 @@ class KerasAutoML():
                 num_iterations))
 
             self.reconstruct_temp_mst_table(i, ranges_dict, configs_prune_lookup) # has keys to evaluate
-            active_keys = plpy.execute("SELECT mst_key FROM {AutoMLSchema.TEMP_MST_TABLE}".format(AutoMLSchema=
-                                                                                                  AutoMLSchema))
+            active_keys = plpy.execute("SELECT {ModelSelectionSchema.MST_KEY} " \
+                                       "FROM {AutoMLSchema.TEMP_MST_TABLE}".format(AutoMLSchema=AutoMLConstants,
+                                                                                   ModelSelectionSchema=ModelSelectionSchema))
             for k in active_keys:
-                i_dict[k['mst_key']] += 1
+                i_dict[k[ModelSelectionSchema.MST_KEY]] += 1
             self.warm_start = int(i != 0)
             mcf = self.metrics_compute_frequency if self._is_valid_metrics_compute_frequency(num_iterations) else None
             with SetGUC("plan_cache_mode", "force_generic_plan"):
@@ -359,15 +504,15 @@ class KerasAutoML():
                                               self.validation_table, mcf, self.warm_start, self.name, self.description)
             self.update_model_output_table(model_training)
             self.update_model_output_info_table(i, model_training, initial_vals)
-        self.end_training_time = self.get_current_timestamp()
+
+            self.print_best_mst_so_far()
+
+        self.end_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
         self.add_additional_info_cols(s_dict, i_dict)
         self.update_model_selection_table()
         self.generate_model_output_summary_table(model_training)
         self.remove_temp_tables(model_training)
-
-    def get_current_timestamp(self):
-        """for start and end times for the chosen AutoML algorithm. Showcased in the output summary table"""
-        return datetime.fromtimestamp(time()).strftime('%Y-%m-%d %H:%M:%S')
+        # cleanup_madlib_temp_tables(self.schema_madlib, AutoMLSchema.TARGET_SCHEMA)
 
     def mst_key_ranges_dict(self, initial_vals):
         """
@@ -394,13 +539,15 @@ class KerasAutoML():
             _assert_equal(len(configs_prune_lookup), 1, "invalid args")
             lower_bound, upper_bound = ranges_dict[self.s_max]
             plpy.execute("CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} AS SELECT * FROM {self.model_selection_table} "
-                         "WHERE mst_key >= {lower_bound} AND mst_key <= {upper_bound}".format(self=self,
-                                                                                              AutoMLSchema=AutoMLSchema,
-                                                                                              lower_bound=lower_bound,
-                                                                                              upper_bound=upper_bound,))
+                         "WHERE {ModelSelectionSchema.MST_KEY} >= {lower_bound} " \
+                         "AND {ModelSelectionSchema.MST_KEY} <= {upper_bound}".format(self=self,
+                                                                                      AutoMLSchema=AutoMLConstants,
+                                                                                      lower_bound=lower_bound,
+                                                                                      upper_bound=upper_bound,
+                                                                                      ModelSelectionSchema=ModelSelectionSchema))
             return
         # dropping and repopulating temp_mst_table
-        drop_tables([AutoMLSchema.TEMP_MST_TABLE])
+        drop_tables([AutoMLConstants.TEMP_MST_TABLE])
 
         # {mst_key} changed from SERIAL to INTEGER for safe insertions and preservation of mst_key values
         create_query = """
@@ -411,29 +558,35 @@ class KerasAutoML():
                             {fit_params} VARCHAR,
                             unique ({model_id}, {compile_params}, {fit_params})
                         );
-                       """.format(AutoMLSchema=AutoMLSchema,
+                       """.format(AutoMLSchema=AutoMLConstants,
                                   mst_key=ModelSelectionSchema.MST_KEY,
                                   model_id=ModelSelectionSchema.MODEL_ID,
                                   compile_params=ModelSelectionSchema.COMPILE_PARAMS,
                                   fit_params=ModelSelectionSchema.FIT_PARAMS)
-        with MinWarning('warning'):
-            plpy.execute(create_query)
+        # with MinWarning('warning'):
+        plpy.execute(create_query)
 
         query = ""
         new_configs = True
         for s_val in configs_prune_lookup:
             lower_bound, upper_bound = ranges_dict[s_val]
             if new_configs:
-                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT mst_key, model_id, compile_params, fit_params " \
-                         "FROM {self.model_selection_table} WHERE mst_key >= {lower_bound} " \
-                         "AND mst_key <= {upper_bound};".format(self=self, AutoMLSchema=AutoMLSchema,
-                                                                lower_bound=lower_bound, upper_bound=upper_bound)
+                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT {ModelSelectionSchema.MST_KEY}, " \
+                         "{ModelSelectionSchema.MODEL_ID}, {ModelSelectionSchema.COMPILE_PARAMS}, " \
+                         "{ModelSelectionSchema.FIT_PARAMS} FROM {self.model_selection_table} WHERE " \
+                         "{ModelSelectionSchema.MST_KEY} >= {lower_bound} AND {ModelSelectionSchema.MST_KEY} <= " \
+                         "{upper_bound};".format(self=self, AutoMLSchema=AutoMLConstants,
+                                                 ModelSelectionSchema=ModelSelectionSchema,
+                                                 lower_bound=lower_bound, upper_bound=upper_bound)
                 new_configs = False
             else:
-                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT mst_key, model_id, compile_params, fit_params " \
-                         "FROM {self.model_info_table} WHERE mst_key >= {lower_bound} " \
-                         "AND mst_key <= {upper_bound} ORDER BY {AutoMLSchema.LOSS_METRIC} " \
-                         "LIMIT {configs_prune_lookup_val};".format(self=self, AutoMLSchema=AutoMLSchema,
+                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT {ModelSelectionSchema.MST_KEY}, " \
+                         "{ModelSelectionSchema.MODEL_ID}, {ModelSelectionSchema.COMPILE_PARAMS}, " \
+                         "{ModelSelectionSchema.FIT_PARAMS} " \
+                         "FROM {self.model_info_table} WHERE {ModelSelectionSchema.MST_KEY} >= {lower_bound} " \
+                         "AND {ModelSelectionSchema.MST_KEY} <= {upper_bound} ORDER BY {AutoMLSchema.LOSS_METRIC} " \
+                         "LIMIT {configs_prune_lookup_val};".format(self=self, AutoMLSchema=AutoMLConstants,
+                                                                    ModelSelectionSchema=ModelSelectionSchema,
                                                                     lower_bound=lower_bound, upper_bound=upper_bound,
                                                                     configs_prune_lookup_val=configs_prune_lookup[s_val])
         plpy.execute(query)
@@ -459,8 +612,9 @@ class KerasAutoML():
         # inserts any newly trained configs
         plpy.execute("INSERT INTO {self.model_output_table} SELECT * FROM {model_training.original_model_output_table} " \
                      "WHERE {model_training.original_model_output_table}.mst_key NOT IN " \
-                     "(SELECT mst_key FROM {self.model_output_table})".format(self=self,
-                                                                              model_training=model_training))
+                     "(SELECT {ModelSelectionSchema.MST_KEY} FROM {self.model_output_table})".format(self=self,
+                                                                              model_training=model_training,
+                                                                              ModelSelectionSchema=ModelSelectionSchema))
 
     def update_model_output_info_table(self, i, model_training, initial_vals):
         """
@@ -473,7 +627,7 @@ class KerasAutoML():
         # normalizing factor for metrics_iters due to warm start
         epochs_factor = sum([n[1] for n in initial_vals.values()][::-1][:i]) # i & initial_vals args needed
         iters = plpy.execute("SELECT {AutoMLSchema.METRICS_ITERS} " \
-                             "FROM {model_training.model_summary_table}".format(AutoMLSchema=AutoMLSchema,
+                             "FROM {model_training.model_summary_table}".format(AutoMLSchema=AutoMLConstants,
                                                                                 model_training=model_training))
         metrics_iters_val = [epochs_factor+mi for mi in iters[0]['metrics_iters']] # global iteration counter
 
@@ -492,15 +646,16 @@ class KerasAutoML():
                      "training_loss=a.training_loss || t.training_loss, ".format(self=self) + validation_update_q +
                      "{AutoMLSchema.METRICS_ITERS}=a.metrics_iters || ARRAY{metrics_iters_val}::INTEGER[] " \
                      "FROM {model_training.model_info_table} t " \
-                     "WHERE a.mst_key=t.mst_key".format(model_training=model_training, AutoMLSchema=AutoMLSchema,
+                     "WHERE a.mst_key=t.mst_key".format(model_training=model_training, AutoMLSchema=AutoMLConstants,
                                                         metrics_iters_val=metrics_iters_val))
 
         # inserts info about metrics and validation for newly trained model configs
         plpy.execute("INSERT INTO {self.model_info_table} SELECT t.*, ARRAY{metrics_iters_val}::INTEGER[] AS metrics_iters " \
                      "FROM {model_training.model_info_table} t WHERE t.mst_key NOT IN " \
-                     "(SELECT mst_key FROM {self.model_info_table})".format(self=self,
+                     "(SELECT {ModelSelectionSchema.MST_KEY} FROM {self.model_info_table})".format(self=self,
                                                                             model_training=model_training,
-                                                                            metrics_iters_val=metrics_iters_val))
+                                                                            metrics_iters_val=metrics_iters_val,
+                                                                            ModelSelectionSchema=ModelSelectionSchema))
 
     def add_additional_info_cols(self, s_dict, i_dict):
         """Adds s and i columns to the info table"""
@@ -512,63 +667,427 @@ class KerasAutoML():
                 "b (key integer, s_val integer, i_val integer) WHERE t.mst_key=b.key".format(self=self, l=l)
         plpy.execute(query)
 
-    def update_model_selection_table(self):
+# @MinWarning("warning")
+class AutoMLHyperopt(KerasAutoML):
+    """
+    This class implements Hyperopt, another automl method that explores awkward search spaces using
+    Random Search, Tree-structured Parzen Estimator (TPE), or Adaptive TPE.
+
+    This function executes hyperopt on top of our multiple model training infrastructure powered with
+    Model hOpper Parallelism (MOP), a hybrid of data and task parallelism.
+
+    This automl method inherits qualities from the automl class.
+    """
+    def __init__(self, schema_madlib, source_table, model_output_table, model_arch_table, model_selection_table,
+                 model_id_list, compile_params_grid, fit_params_grid, automl_method,
+                 automl_params, random_state=None, object_table=None,
+                 use_gpus=False, validation_table=None, metrics_compute_frequency=None,
+                 name=None, description=None, **kwargs):
+        automl_method = automl_method if automl_method else AutoMLConstants.HYPEROPT
+        automl_params = automl_params if automl_params else 'num_configs=20, num_iterations=5, algorithm=tpe'
+        KerasAutoML.__init__(self, schema_madlib, source_table, model_output_table, model_arch_table,
+                             model_selection_table, model_id_list, compile_params_grid, fit_params_grid,
+                             automl_method, automl_params, random_state, object_table, use_gpus,
+                             validation_table, metrics_compute_frequency, name, description, **kwargs)
+        self.compile_params_grid = self.compile_params_grid.replace('\n', '').replace(' ', '')
+        self.fit_params_grid = self.fit_params_grid.replace('\n', '').replace(' ', '')
+        try:
+            self.compile_params_grid = literal_eval(self.compile_params_grid)
+
+        except:
+            plpy.error("Invalid syntax in 'compile_params_dict'")
+        try:
+            self.fit_params_grid = literal_eval(self.fit_params_grid)
+        except:
+            plpy.error("Invalid syntax in 'fit_params_dict'")
+        self.validate_and_define_inputs()
+        self.num_segments = self.get_num_segments()
+
+        self.create_model_output_table()
+        self.create_model_output_info_table()
+        self.find_hyperopt_config()
+
+    def get_num_segments(self):
         """
-        Drops and re-create the mst table to only include the best performing model configuration.
+        # query dist rules from summary table to get the total no of segments
+        :return:
         """
-        drop_tables([self.model_selection_table])
+        source_summary_table = add_postfix(self.source_table, '_summary')
+        dist_rules = plpy.execute("SELECT {0} from {1}".format(DISTRIBUTION_RULES, source_summary_table))[0][DISTRIBUTION_RULES]
+        #TODO create constant for all_segments
+        if dist_rules == "all_segments":
+            return get_seg_number()
 
-        # only retaining best performing config
-        plpy.execute("CREATE TABLE {self.model_selection_table} AS SELECT mst_key, model_id, compile_params, " \
-                     "fit_params FROM {self.model_info_table} " \
-                     "ORDER BY {AutoMLSchema.LOSS_METRIC} LIMIT 1".format(self=self, AutoMLSchema=AutoMLSchema))
+        return len(dist_rules)
 
-    def generate_model_output_summary_table(self, model_training):
+    def validate_and_define_inputs(self):
+        automl_params_dict = extract_keyvalue_params(self.automl_params,
+                                                     lower_case_names=True)
+        # casting relevant values to int
+        for i in automl_params_dict:
+            try:
+                automl_params_dict[i] = int(automl_params_dict[i])
+            except ValueError:
+                pass
+        _assert(len(automl_params_dict) >= 1 and len(automl_params_dict) <= 3,
+                "{0}: Only num_configs, num_iterations, and algorithm may be specified".format(self.module_name))
+        for i in automl_params_dict:
+            if i == AutoMLConstants.NUM_CONFIGS:
+                self.num_configs = automl_params_dict[AutoMLConstants.NUM_CONFIGS]
+            elif i == AutoMLConstants.NUM_ITERS:
+                self.num_iters = automl_params_dict[AutoMLConstants.NUM_ITERS]
+            elif i == AutoMLConstants.ALGORITHM:
+                if automl_params_dict[AutoMLConstants.ALGORITHM].lower() == 'rand':
+                    self.algorithm = rand
+                elif automl_params_dict[AutoMLConstants.ALGORITHM].lower() == 'tpe':
+                    self.algorithm = tpe
+                # elif automl_params_dict[AutoMLSchema.ALGORITHM].lower() == 'atpe':
+                #     self.algorithm = atpe
+                # uncomment the above lines after atpe works # TODO
+                else:
+                    plpy.error("{0}: valid algorithm 'automl_params' for hyperopt: 'rand', 'tpe'".format(self.module_name)) # , or 'atpe'
+            else:
+                plpy.error("{0}: {1} is an invalid automl param".format(self.module_name, i))
+        _assert(self.num_configs > 0 and self.num_iters > 0, "{0}: num_configs and num_iterations in 'automl_params' "
+                                                            "must be > 0".format(self.module_name))
+        _assert(self._is_valid_metrics_compute_frequency(self.num_iters), "{0}: 'metrics_compute_frequency' "
+                                                                          "out of iteration range".format(self.module_name))
+
+    def find_hyperopt_config(self):
         """
-        Creates and populates static values related to the AutoML workload.
-        :param model_training: Fit Multiple function call object.
+        Executes hyperopt on top of MOP.
         """
-        create_query = plpy.prepare("""
-                CREATE TABLE {self.model_summary_table} AS
-                SELECT
-                    $MAD${self.source_table}$MAD$::TEXT AS source_table,
-                    $MAD${self.validation_table}$MAD$::TEXT AS validation_table,
-                    $MAD${self.model_output_table}$MAD$::TEXT AS model,
-                    $MAD${self.model_info_table}$MAD$::TEXT AS model_info,
-                    (SELECT dependent_varname FROM {model_training.model_summary_table})
-                    AS dependent_varname,
-                    (SELECT independent_varname FROM {model_training.model_summary_table})
-                    AS independent_varname,
-                    $MAD${self.model_arch_table}$MAD$::TEXT AS model_arch_table,
-                    $MAD${self.model_selection_table}$MAD$::TEXT AS model_selection_table,
-                    $MAD${self.automl_method}$MAD$::TEXT AS automl_method,
-                    $MAD${self.automl_params}$MAD$::TEXT AS automl_params,
-                    $MAD${self.random_state}$MAD$::TEXT AS random_state,
-                    $MAD${self.object_table}$MAD$::TEXT AS object_table,
-                    {self.use_gpus} AS use_gpus,
-                    (SELECT metrics_compute_frequency FROM {model_training.model_summary_table})::INTEGER
-                    AS metrics_compute_frequency,
-                    $MAD${self.name}$MAD$::TEXT AS name,
-                    $MAD${self.description}$MAD$::TEXT AS description,
-                    '{self.start_training_time}'::TIMESTAMP AS start_training_time,
-                    '{self.end_training_time}'::TIMESTAMP AS end_training_time,
-                    (SELECT madlib_version FROM {model_training.model_summary_table}) AS madlib_version,
-                    (SELECT num_classes FROM {model_training.model_summary_table})::INTEGER AS num_classes,
-                    (SELECT class_values FROM {model_training.model_summary_table}) AS class_values,
-                    (SELECT dependent_vartype FROM {model_training.model_summary_table})
-                    AS dependent_vartype,
-                    (SELECT normalizing_const FROM {model_training.model_summary_table})
-                    AS normalizing_const
-            """.format(self=self, model_training=model_training))
+        make_mst_summary = True
+        trials = Trials()
+        domain = Domain(None, self.get_search_space())
+        rand_state = np.random.RandomState(self.random_state)
+        configs_lst = self.get_configs_list(self.num_configs, self.num_segments)
 
-        with MinWarning('warning'):
-            plpy.execute(create_query)
+        self.start_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
+        fit_multiple_runtime = 0
+        for low, high in configs_lst:
+            i, n = low, high - low + 1
 
-    def remove_temp_tables(self, model_training):
+            # Using HyperOpt TPE/ATPE to generate parameters
+            hyperopt_params = []
+            sampled_params = []
+            for j in range(i, i + n):
+                new_param = self.algorithm.suggest([j], domain, trials, rand_state.randint(0, AutoMLConstants.INT_MAX))
+                new_param[0]['status'] = STATUS_RUNNING
+
+                trials.insert_trial_docs(new_param)
+                trials.refresh()
+                hyperopt_params.append(new_param[0])
+                sampled_params.append(new_param[0]['misc']['vals'])
+
+            model_id_list, compile_params, fit_params = self.extract_param_vals(sampled_params)
+            msts_list = self.generate_msts(model_id_list, compile_params, fit_params)
+            # cleanup_madlib_temp_tables(self.schema_madlib, AutoMLSchema.TARGET_SCHEMA)
+            try:
+                self.remove_temp_tables(model_training)
+            except:
+                pass
+            self.populate_temp_mst_tables(i, msts_list)
+
+            plpy.info("***Evaluating {n} newly suggested model configurations***".format(n=n))
+            fit_multiple_start_time = time.time()
+            model_training = FitMultipleModel(self.schema_madlib, self.source_table, AutoMLConstants.TEMP_OUTPUT_TABLE,
+                                              AutoMLConstants.TEMP_MST_TABLE, self.num_iters, self.use_gpus, self.validation_table,
+                                              self.metrics_compute_frequency, False, self.name, self.description, fit_multiple_runtime)
+            fit_multiple_runtime += time.time() - fit_multiple_start_time
+            if make_mst_summary:
+                self.generate_mst_summary_table(self.model_selection_summary_table)
+                make_mst_summary = False
+
+            # HyperOpt TPE update
+            for k, hyperopt_param in enumerate(hyperopt_params, i):
+                loss_val = plpy.execute("SELECT {AutoMLSchema.LOSS_METRIC} FROM {model_training.model_info_table} " \
+                             "WHERE {ModelSelectionSchema.MST_KEY}={k}".format(AutoMLSchema=AutoMLConstants,
+                                                                               ModelSelectionSchema=ModelSelectionSchema,
+                                                                               **locals()))[0][AutoMLConstants.LOSS_METRIC]
+
+                # avoid removing the two lines below (part of Hyperopt updates)
+                hyperopt_param['status'] = STATUS_OK
+                hyperopt_param['result'] = {'loss': loss_val, 'status': STATUS_OK}
+            trials.refresh()
+
+            # stacks info of all model configs together
+            self.update_model_output_and_info_tables(model_training)
+
+            self.print_best_mst_so_far()
+
+        self.end_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
+        self.update_model_selection_table()
+        self.generate_model_output_summary_table(model_training)
+        # cleanup_madlib_temp_tables(self.schema_madlib, AutoMLSchema.TARGET_SCHEMA)
+        self.remove_temp_tables(model_training)
+
+    def get_configs_list(self, num_configs, num_segments):
         """
-        Remove all intermediate tables created for AutoML runs/updates.
-        :param model_training: Fit Multiple function call object.
+        Gets schedule to evaluate model configs
+        :return: Model configs evaluation schedule
         """
-        drop_tables([model_training.original_model_output_table, model_training.model_info_table,
-                     model_training.model_summary_table, AutoMLSchema.TEMP_MST_TABLE,
-                     AutoMLSchema.TEMP_MST_SUMMARY_TABLE])
+        num_buckets = int(round(float(num_configs) / num_segments))
+        configs_list = []
+        start_idx = 1
+        models_populated = 0
+        for _ in range(num_buckets - 1):
+            end_idx = start_idx + num_segments
+            models_populated += num_segments
+            configs_list.append((start_idx, end_idx - 1))
+            start_idx = end_idx
+
+        remaining_models = num_configs - models_populated
+        configs_list.append((start_idx, start_idx + remaining_models-1))
+
+        return configs_list
+
+    def get_search_space(self):
+        """
+        Converts user inputs to hyperopt search space.
+        :return: Hyperopt search space
+        """
+
+        # initial params (outside 'optimizer_params_list')
+        hyperopt_search_dict = {}
+        hyperopt_search_dict['model_id'] = self.get_hyperopt_exps('model_id', self.model_id_list)
+
+
+        for j in self.fit_params_grid:
+            hyperopt_search_dict[j] = self.get_hyperopt_exps(j, self.fit_params_grid[j])
+
+        for i in self.compile_params_grid:
+            if i != ModelSelectionSchema.OPTIMIZER_PARAMS_LIST:
+                hyperopt_search_dict[i] = self.get_hyperopt_exps(i, self.compile_params_grid[i])
+
+        hyperopt_search_space_lst = []
+
+        counter = 1 # for unique names to allow multiple distribution options for optimizer params
+        for optimizer_dict in self.compile_params_grid[ModelSelectionSchema.OPTIMIZER_PARAMS_LIST]:
+            for o_param in optimizer_dict:
+                name = o_param + '_' + str(counter)
+                hyperopt_search_dict[name] = self.get_hyperopt_exps(name, optimizer_dict[o_param])
+            # appending deep copy
+            hyperopt_search_space_lst.append({k:v for k, v in hyperopt_search_dict.items()})
+            for o_param in optimizer_dict:
+                name = o_param + '_' + str(counter)
+                del hyperopt_search_dict[name]
+            counter += 1
+
+        return hp.choice('space', hyperopt_search_space_lst)
+
+    def get_hyperopt_exps(self, cp, param_value_list):
+        """
+        Samples a value from a given list of values, either randomly from a list of discrete elements,
+        or from a specified distribution.
+        :param cp: compile param
+        :param param_value_list: list of values (or specified distribution) for a param
+        :return: sampled value
+        """
+        # check if need to sample from a distribution
+        if type(param_value_list[-1]) == str and all([type(i) != str and not callable(i) for i in param_value_list[:-1]]) \
+                and len(param_value_list) > 1:
+            _assert_equal(len(param_value_list), 3,
+                          "{0}: '{1}' should have exactly 3 elements if picking from a distribution".format(self.module_name, cp))
+            _assert(param_value_list[1] > param_value_list[0],
+                    "{0}: '{1}' should be of the format [lower_bound, upper_bound, distribution_type]".format(self.module_name, cp))
+            if param_value_list[-1] == 'linear':
+                return hp.uniform(cp, param_value_list[0], param_value_list[1])
+            elif param_value_list[-1] == 'log':
+                return hp.loguniform(cp, np.log(param_value_list[0]), np.log(param_value_list[1]))
+            else:
+                plpy.error("{0}: Please choose a valid distribution type for '{1}': {2}".format(
+                    self.module_name,
+                    self.original_param_details(cp)[0],
+                    ['linear', 'log']))
+        else:
+            # random sampling
+            return hp.choice(cp, param_value_list)
+
+    def extract_param_vals(self, sampled_params):
+        """
+        Extract parameter values from hyperopt search space.
+        :param sampled_params: params suggested by hyperopt.
+        :return: lists of model ids, compile and fit params.
+        """
+        model_id_list, compile_params, fit_params = [], [], []
+        for params_dict in sampled_params:
+            compile_dict, fit_dict, optimizer_params_dict = {}, {}, {}
+            for p in params_dict:
+                if len(params_dict[p]) == 0 or p == 'space':
+                    continue
+                val = params_dict[p][0]
+                if p == 'model_id':
+                    model_id_list.append(self.model_id_list[val])
+                    continue
+                elif p in self.fit_params_grid:
+                    try:
+                        # check if params_dict[p] is an index
+                        fit_dict[p] = self.fit_params_grid[p][val]
+                    except TypeError:
+                        fit_dict[p] = params_dict[p]
+                elif p in self.compile_params_grid:
+                    try:
+                        # check if params_dict[p] is an index
+                        compile_dict[p] = self.compile_params_grid[p][val]
+                    except TypeError:
+                        compile_dict[p] = val
+                else:
+                    o_param, idx = self.original_param_details(p) # extracting unique attribute
+                    try:
+                        # check if params_dict[p] is an index (i.e. optimizer, for example)
+                        optimizer_params_dict[o_param] = self.compile_params_grid[
+                            ModelSelectionSchema.OPTIMIZER_PARAMS_LIST][idx][o_param][val]
+                    except TypeError:
+                        optimizer_params_dict[o_param] = val
+            compile_dict[ModelSelectionSchema.OPTIMIZER_PARAMS_LIST] = optimizer_params_dict
+
+            compile_params.append(compile_dict)
+            fit_params.append(fit_dict)
+
+        return model_id_list, compile_params, fit_params
+
+    def original_param_details(self, name):
+        """
+        Returns the original param name and book-keeping detail.
+        :param name: name of the param (example - lr_1, epsilon_12)
+        :return: original param name and book-keeping position.
+        """
+        parts = name.split('_')
+        return '_'.join(parts[:-1]), int(parts[-1]) - 1
+
+
+    def generate_msts(self, model_id_list, compile_params, fit_params):
+        """
+        Generates msts to insert in the mst table.
+        :param model_id_list: list of model ids
+        :param compile_params: list compile params
+        :param fit_params:list of fit params
+        :return: List of msts to insert in the mst table.
+        """
+        assert len(model_id_list) == len(compile_params) == len(fit_params)
+        msts = []
+
+        for i in range(len(compile_params)):
+            combination = {}
+            combination[ModelSelectionSchema.MODEL_ID] = model_id_list[i]
+            combination[ModelSelectionSchema.COMPILE_PARAMS] = generate_row_string(compile_params[i])
+            combination[ModelSelectionSchema.FIT_PARAMS] = generate_row_string(fit_params[i])
+            msts.append(combination)
+
+        return msts
+
+    def populate_temp_mst_tables(self, i, msts_list):
+        """
+        Creates and populates temp mst and summary tables with newly suggested model configs for evaluation.
+        :param i: mst key number
+        :param msts_list: list of generated msts.
+        """
+        # extra sanity check
+        if table_exists(AutoMLConstants.TEMP_MST_TABLE):
+            drop_tables([AutoMLConstants.TEMP_MST_TABLE])
+
+        create_query = """
+                        CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} (
+                            {mst_key} INTEGER,
+                            {model_id} INTEGER,
+                            {compile_params} VARCHAR,
+                            {fit_params} VARCHAR,
+                            unique ({model_id}, {compile_params}, {fit_params})
+                        );
+                       """.format(AutoMLSchema=AutoMLConstants,
+                                  mst_key=ModelSelectionSchema.MST_KEY,
+                                  model_id=ModelSelectionSchema.MODEL_ID,
+                                  compile_params=ModelSelectionSchema.COMPILE_PARAMS,
+                                  fit_params=ModelSelectionSchema.FIT_PARAMS)
+        # with MinWarning('warning'):
+        plpy.execute(create_query)
+        mst_key_val = i
+        for mst in msts_list:
+            model_id = mst[ModelSelectionSchema.MODEL_ID]
+            compile_params = mst[ModelSelectionSchema.COMPILE_PARAMS]
+            fit_params = mst[ModelSelectionSchema.FIT_PARAMS]
+            insert_query = """
+                            INSERT INTO
+                                {AutoMLSchema.TEMP_MST_TABLE}(
+                                    {mst_key_col},
+                                    {model_id_col},
+                                    {compile_params_col},
+                                    {fit_params_col}
+                                )
+                            VALUES (
+                                {mst_key_val},
+                                {model_id},
+                                $${compile_params}$$,
+                                $${fit_params}$$
+                            )
+                           """.format(mst_key_col=ModelSelectionSchema.MST_KEY,
+                                      model_id_col=ModelSelectionSchema.MODEL_ID,
+                                      compile_params_col=ModelSelectionSchema.COMPILE_PARAMS,
+                                      fit_params_col=ModelSelectionSchema.FIT_PARAMS,
+                                      AutoMLSchema=AutoMLConstants,
+                                      **locals())
+            mst_key_val += 1
+            plpy.execute(insert_query)
+
+        self.generate_mst_summary_table(AutoMLConstants.TEMP_MST_SUMMARY_TABLE)
+
+    def generate_mst_summary_table(self, tbl_name):
+        """
+        generates mst summary table with the given name
+        :param tbl_name: name of summary table
+        """
+        _assert(tbl_name.endswith('_summary'), 'invalid summary table name')
+
+        # extra sanity check
+        if table_exists(tbl_name):
+            drop_tables([tbl_name])
+
+        create_query = """
+                        CREATE TABLE {tbl_name} (
+                            {model_arch_table} VARCHAR,
+                            {object_table} VARCHAR
+                        );
+                       """.format(tbl_name=tbl_name,
+                                  model_arch_table=ModelSelectionSchema.MODEL_ARCH_TABLE,
+                                  object_table=ModelSelectionSchema.OBJECT_TABLE)
+        # with MinWarning('warning'):
+        plpy.execute(create_query)
+
+        if self.object_table is None:
+            object_table = 'NULL::VARCHAR'
+        else:
+            object_table = '$${0}$$'.format(self.object_table)
+        insert_summary_query = """
+                        INSERT INTO
+                            {tbl_name}(
+                                {model_arch_table_name},
+                                {object_table_name}
+                        )
+                        VALUES (
+                            $${self.model_arch_table}$$,
+                            {object_table}
+                        )
+                       """.format(model_arch_table_name=ModelSelectionSchema.MODEL_ARCH_TABLE,
+                                  object_table_name=ModelSelectionSchema.OBJECT_TABLE,
+                                  **locals())
+        plpy.execute(insert_summary_query)
+
+    def update_model_output_and_info_tables(self, model_training):
+        """
+        Updates model output and info tables by stacking rows after each evaluation round.
+        :param model_training: Fit Multiple class object
+        """
+        metrics_iters = plpy.execute("SELECT {AutoMLSchema.METRICS_ITERS} " \
+                                     "FROM {model_training.original_model_output_table}_summary".format(self=self,
+                                                                                                        model_training=model_training,
+                                                                                                        AutoMLSchema=AutoMLConstants))[0][AutoMLConstants.METRICS_ITERS]
+        if metrics_iters:
+            metrics_iters = "ARRAY{0}".format(metrics_iters)
+        # stacking new rows from training
+        plpy.execute("INSERT INTO {self.model_output_table} SELECT * FROM " \
+                     "{model_training.original_model_output_table}".format(self=self, model_training=model_training))
+        plpy.execute("INSERT INTO {self.model_info_table} SELECT *, {metrics_iters} FROM " \
+                     "{model_training.model_info_table}".format(self=self,
+                                                                     model_training=model_training,
+                                                                     metrics_iters=metrics_iters))
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
index 06889d2..98617d7 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
@@ -169,20 +169,25 @@ madlib_keras_automl(
 
   <dt>automl_method (optional)</dt>
   <dd>VARCHAR, default 'hyperband'. Name of the automl algorithm to run.
-  Currently only support hyperband. Note that you can also use short prefixes
-  for the 'hyperband' keyword, e.g.,'hyper' or 'hyp' instead
-  of writing out 'hyperband' in full.
+  Can be either 'hyperband' or hyperopt'. Prefixing is not supported but arg value can be case insensitive.
   </dd>
 
   <dt>automl_params (optional)</dt>
-  <dd>VARCHAR, default 'R=6, eta=3, skip_last=0'. Parameters for the chosen automl
-  method in a comma-separated string of key-value pairs. Hyperband params are:
+  <dd>VARCHAR, default 'R=6, eta=3, skip_last=0' (for Hyperband). Parameters for the chosen automl method in a
+  comma-separated string of key-value pairs. For eg - 'num_configs=20, num_iterations=5, algorithm=tpe' for Hyperopt
+  Hyperband params are:
   R - the maximum amount of resources/iterations allocated to a single configuration
-  in a round of hyperband, eta - factor controlling the proportion of configurations discarded in each
-  round of successive halving, skip_last - number of last diagonal brackets to skip running
+  in a round of hyperband,
+  eta - factor controlling the proportion of configurations discarded in each
+  round of successive halving,
+  skip_last - number of last diagonal brackets to skip running
   in the algorithm.
   We encourage setting an low R value (i.e. 2 to 10), or a high R value and a high skip_last value to evaluate
   a variety of configurations with decent number of iterations. See the description below for details.
+  Hyperopt params are:
+  num_configs - total number of model configurations to evaluate,
+  num_iterations - fixed number of iterations for evaluating each model configurations,
+  algorithm - name of algorithm to explore search space in hyperopt ('rand', 'tpe', 'atpe').
   </dd>
 
   <dt>random_state (optional)</dt>
@@ -627,7 +632,6 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.hyperband_schedule(
 $$ LANGUAGE plpythonu VOLATILE
               m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
 
-
 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.madlib_keras_automl(
     source_table                   VARCHAR,
     model_output_table             VARCHAR,
@@ -637,7 +641,7 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.madlib_keras_automl(
     compile_params_grid            VARCHAR,
     fit_params_grid                VARCHAR,
     automl_method                  VARCHAR DEFAULT 'hyperband',
-    automl_params                  VARCHAR DEFAULT 'R=6, eta=3, skip_last=0',
+    automl_params                  VARCHAR DEFAULT NULL,
     random_state                   INTEGER DEFAULT NULL,
     object_table                   VARCHAR DEFAULT NULL,
     use_gpus                       BOOLEAN DEFAULT FALSE,
@@ -648,6 +652,11 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.madlib_keras_automl(
 ) RETURNS VOID AS $$
     PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl')
     with AOControl(False):
-        schedule_loader = madlib_keras_automl.KerasAutoML(**globals())
+        if automl_method is None or automl_method.lower() == 'hyperband':
+            schedule_loader = madlib_keras_automl.AutoMLHyperband(**globals())
+        elif automl_method.lower() == 'hyperopt':
+            schedule_loader = madlib_keras_automl.AutoMLHyperopt(**globals())
+        else:
+            plpy.error("madlib_keras_automl: The chosen automl method must be 'hyperband' or 'hyperopt'")
 $$ LANGUAGE plpythonu VOLATILE
     m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in
index c821474..1e49261 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in
@@ -81,7 +81,26 @@ class FitMultipleModel():
                  model_selection_table, num_iterations,
                  use_gpus=False, validation_table=None,
                  metrics_compute_frequency=None, warm_start=False, name="",
-                 description="", use_caching=False, **kwargs):
+                 description="", use_caching=False, metrics_elapsed_time_offset=0, **kwargs):
+        """
+
+        :param schema_madlib: schema name
+        :param source_table: input table containing training dataset
+        :param model_output_table: output table
+        :param model_selection_table: input table containing model configs
+        :param num_iterations: number of iterations to train
+        :param use_gpus: determines whether GPUs are to be used for training
+        :param validation_table: input table containing the validation dataset
+        :param metrics_compute_frequency: Frequency to compute per-iteration metrics
+                                          for the training dataset and validation dataset
+        :param warm_start: indicates whether to initialize weights with the coefficients
+                           from the last call of the fit function
+        :param name: name
+        :param description: description
+        :param metrics_elapsed_time_offset: time elapsed for the previous call to fit_multiple
+                                            (internal param used by automl to accumulate
+                                             metrics_elapsed_time)
+        """
         # set the random seed for visit order/scheduling
         random.seed(1)
         if is_platform_pg():
@@ -117,6 +136,7 @@ class FitMultipleModel():
         self.use_gpus = use_gpus
         self.segments_per_host = get_segments_per_host()
         self.cached_source_table = unique_string('cached_source_table')
+        self.metrics_elapsed_time_offset = metrics_elapsed_time_offset
         if self.use_gpus:
             self.accessible_gpus_for_seg = get_accessible_gpus_for_seg(
                 self.schema_madlib, self.segments_per_host, self.module_name)
@@ -283,7 +303,7 @@ class FitMultipleModel():
                 self.model_output_table,
                 mst[self.mst_key_col])
             mst_metric_eval_time[mst[self.mst_key_col]] \
-                .append(time.time() - self.metrics_elapsed_start_time)
+                .append(self.metrics_elapsed_time_offset + (time.time() - self.metrics_elapsed_start_time))
             mst_loss[mst[self.mst_key_col]].append(loss)
             mst_metric[mst[self.mst_key_col]].append(metric)
             self.info_str += "\n\tmst_key={0}: metric={1}, loss={2}".format(mst[self.mst_key_col], metric, loss)
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
index ca54a4d..96c2817 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
@@ -26,6 +26,7 @@ from utilities.validate_args import table_exists
 from madlib_keras_gpu_info import GPUInfoFunctions
 import plpy
 from math import isnan
+# from madlib_keras_model_selection import ModelSelectionSchema
 
 ############### Constants used in other deep learning files #########
 # Name of columns in model summary table.
@@ -53,7 +54,7 @@ SMALLINT_SQL_TYPE = 'SMALLINT'
 DEFAULT_NORMALIZING_CONST = 1.0
 GP_SEGMENT_ID_COLNAME = "gp_segment_id"
 INTERNAL_GPU_CONFIG = '__internal_gpu_config__'
-
+DISTRIBUTION_RULES = "distribution_rules"
 #####################################################################
 
 # Prepend a dimension to np arrays using expand_dims.
@@ -353,3 +354,57 @@ def get_metrics_sql_string(metrics_list, is_metrics_specified=True):
         metrics_final = metrics_all = 'NULL'
     return metrics_final, metrics_all
 
+
+def generate_row_string(configs_dict):
+    """
+    Generate row strings for MST table.
+    :param configs_dict: Dictionary of params configs (preferably either only compile params
+    or only fit params).
+    :return: string to insert as a row value in MST table.
+    """
+    result_row_string = ""
+    opl = 'optimizer_params_list'
+
+    if opl in configs_dict:
+        optimizer_params_dict = configs_dict[opl]
+        if 'optimizer' in optimizer_params_dict:
+            if optimizer_params_dict['optimizer'].lower() == 'sgd':
+                optimizer_value = "SGD"
+            elif optimizer_params_dict['optimizer'].lower() == 'rmsprop':
+                optimizer_value = "RMSprop"
+            else:
+                optimizer_value = optimizer_params_dict['optimizer'].capitalize()
+            opt_string = "optimizer" + "=" + "'" + str(optimizer_value) \
+                         + "()" + "'"
+        else:
+            opt_string = "optimizer='RMSprop()'" # default optimizer
+        opt_param_string = ""
+        for opt_param in optimizer_params_dict:
+            if opt_param == 'optimizer':
+                continue
+            opt_param_string += opt_param + '=' + str(optimizer_params_dict[opt_param]) + ','
+        if opt_param_string == "":
+            result_row_string += opt_string
+        else:
+            opt_param_string = opt_param_string[:-1] # to exclude the last comma
+            part = opt_string.split('(')
+            result_row_string += part[0] + '(' + opt_param_string + part[1]
+
+    for c in configs_dict:
+        if c == opl:
+            continue
+        elif c == 'metrics':
+            if callable(configs_dict[c]):
+                result_row_string += "," + str(c) + "=" + "[" + str(configs_dict[c]) + "]"
+            else:
+                result_row_string += "," + str(c) + "=" + "['" + str(configs_dict[c]) + "']"
+        else:
+            if type(configs_dict[c]) == str or type(configs_dict[c]) == np.string_:
+                result_row_string += "," + str(c) + "=" + "'" + str(configs_dict[c]) + "'"
+            else:
+                # ints, floats, none type, booleans
+                result_row_string += "," + str(c) + "=" + str(configs_dict[c])
+
+    if result_row_string[0] == ',':
+        return result_row_string[1:]
+    return result_row_string
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
index 99c7150..3ea37dc 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
@@ -27,12 +27,13 @@ import plpy
 from copy import deepcopy
 
 from madlib_keras_custom_function import CustomFunctionSchema
+from madlib_keras_helper import generate_row_string
 from madlib_keras_validator import MstLoaderInputValidator
 from madlib_keras_wrapper import convert_string_of_args_to_dict
 from madlib_keras_wrapper import parse_and_validate_fit_params
 from madlib_keras_wrapper import parse_and_validate_compile_params
 from utilities.control import MinWarning
-from utilities.utilities import add_postfix, extract_keyvalue_params, _assert, _assert_equal
+from utilities.utilities import add_postfix, _assert, _assert_equal, extract_keyvalue_params
 from utilities.utilities import quote_ident, get_schema
 from utilities.validate_args import table_exists, drop_tables
 
@@ -462,8 +463,8 @@ class MstSearch():
                     fit_configs[k] = config[k]
                 else:
                     plpy.error("DL: {0} is an unidentified key".format(k))
-            combination[ModelSelectionSchema.COMPILE_PARAMS] = self.generate_row_string(compile_configs)
-            combination[ModelSelectionSchema.FIT_PARAMS] = self.generate_row_string(fit_configs)
+            combination[ModelSelectionSchema.COMPILE_PARAMS] = generate_row_string(compile_configs)
+            combination[ModelSelectionSchema.FIT_PARAMS] = generate_row_string(fit_configs)
             self.msts.append(combination)
 
     def find_random_combinations(self):
@@ -479,9 +480,9 @@ class MstSearch():
                 seed_changes += 1
             combination[ModelSelectionSchema.MODEL_ID] = np.random.choice(self.model_id_list)
             compile_dict, seed_changes = self.generate_param_config(self.compile_params_dict, seed_changes)
-            combination[ModelSelectionSchema.COMPILE_PARAMS] = self.generate_row_string(compile_dict)
+            combination[ModelSelectionSchema.COMPILE_PARAMS] = generate_row_string(compile_dict)
             fit_dict, seed_changes = self.generate_param_config(self.fit_params_dict, seed_changes)
-            combination[ModelSelectionSchema.FIT_PARAMS] = self.generate_row_string(fit_dict)
+            combination[ModelSelectionSchema.FIT_PARAMS] = generate_row_string(fit_dict)
             self.msts.append(combination)
 
     def generate_param_config(self, params_dict, seed_changes):
@@ -542,58 +543,6 @@ class MstSearch():
             # random sampling
             return np.random.choice(param_value_list)
 
-    def generate_row_string(self, configs_dict):
-        """
-        Generate row strings for MST table.
-        :param configs_dict: Dictionary of params config.
-        :return: string to insert as a row in MST table.
-        """
-        result_row_string = ""
-
-        if ModelSelectionSchema.OPTIMIZER_PARAMS_LIST in configs_dict:
-            optimizer_params_dict = configs_dict[ModelSelectionSchema.OPTIMIZER_PARAMS_LIST]
-            if 'optimizer' in optimizer_params_dict:
-                if optimizer_params_dict['optimizer'].lower() == 'sgd':
-                    optimizer_value = "SGD"
-                elif optimizer_params_dict['optimizer'].lower() == 'rmsprop':
-                    optimizer_value = "RMSprop"
-                else:
-                    optimizer_value = optimizer_params_dict['optimizer'].capitalize()
-                opt_string = "optimizer" + "=" + "'" + str(optimizer_value) \
-                             + "()" + "'"
-            else:
-                opt_string = "optimizer='RMSprop()'" # default optimizer
-            opt_param_string = ""
-            for opt_param in optimizer_params_dict:
-                if opt_param == 'optimizer':
-                    continue
-                opt_param_string += opt_param + '=' + str(optimizer_params_dict[opt_param]) + ','
-            if opt_param_string == "":
-                result_row_string += opt_string
-            else:
-                opt_param_string = opt_param_string[:-1] # to exclude the last comma
-                part = opt_string.split('(')
-                result_row_string += part[0] + '(' + opt_param_string + part[1]
-
-        for c in configs_dict:
-            if c == ModelSelectionSchema.OPTIMIZER_PARAMS_LIST:
-                continue
-            elif c == 'metrics':
-                if callable(configs_dict[c]):
-                    result_row_string += "," + str(c) + "=" + "[" + str(configs_dict[c]) + "]"
-                else:
-                    result_row_string += "," + str(c) + "=" + "['" + str(configs_dict[c]) + "']"
-            else:
-                if type(configs_dict[c]) == str or type(configs_dict[c]) == np.string_:
-                    result_row_string += "," + str(c) + "=" + "'" + str(configs_dict[c]) + "'"
-                else:
-                    # ints, floats, none type, booleans
-                    result_row_string += "," + str(c) + "=" + str(configs_dict[c])
-
-        if result_row_string[0] == ',':
-            return result_row_string[1:]
-        return result_row_string
-
     def create_mst_table(self):
         """Initialize the output mst table, if it doesn't exist (for incremental loading).
         """
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in
index 9382407..8b2157d 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in
@@ -248,7 +248,7 @@ class InputValidator:
         gpu_config = plpy.execute(
             "SELECT {0} FROM {1}".format(INTERNAL_GPU_CONFIG, summary_table)
             )[0][INTERNAL_GPU_CONFIG]
-        if gpu_config == 'all_segments':
+        if gpu_config == DistributionRulesOptions.ALL_SEGMENTS:
             _assert(0 not in accessible_gpus_for_seg,
                 "{0} error: Host(s) are missing gpus.".format(module_name))
         else:
diff --git a/src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in b/src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
index 0516687..da9fb8a 100644
--- a/src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
+++ b/src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
@@ -29,21 +29,82 @@ m4_include(`SQLCommon.m4')
 m4_changequote(`<!', `!>')
 m4_ifdef(<!__POSTGRESQL__!>, <!!>, <!
 
---------------------------- MADLIB KERAS AUTOML HYPERBAND TEST CASES ---------------------------
-
--- test table dimensions / happy path
+--------------------------- HYPEROPT TEST CASES ---------------------------
+-- test table dimensions / happy path (algorithm = rand)
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
+automl_mst_table_summary;
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$);
+                           ARRAY[1], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [{'optimizer': ['Adam', 'SGD'],
+    'lr': [0.01, 0.011, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [50], 'epochs': [1]}$$,
+    'hyperopt', 'num_configs=5, num_iterations=6, algorithm=rand', NULL, NULL, FALSE, NULL, 1, 'test1', 'test1 descr');
 
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
 SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output;
 SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output_info;
+-- Validate model output summary table
+SELECT assert(
+    source_table = 'iris_data_packed' AND
+    validation_table IS NULL AND
+    model = 'automl_output' AND
+    model_info = 'automl_output_info' AND
+    dependent_varname = 'class_text' AND
+    independent_varname = 'attributes' AND
+    model_arch_table = 'iris_model_arch' AND
+    model_selection_table = 'automl_mst_table' AND
+    automl_method = 'hyperopt' AND
+    automl_params = 'num_configs=5, num_iterations=6, algorithm=rand' AND
+    random_state IS NULL AND
+    object_table IS NULL AND
+    use_gpus = FALSE AND
+    metrics_compute_frequency = 1 AND
+    name = 'test1' AND
+    description = 'test1 descr' AND
+    start_training_time < now() AND
+    end_training_time < now() AND
+    madlib_version IS NOT NULL AND
+    num_classes = 3 AND
+    class_values = '{Iris-setosa,Iris-versicolor,Iris-virginica}' AND
+    dependent_vartype = 'character varying' AND
+    normalizing_const = 1, 'Output summary table validation failed. Actual:' || __to_char(summary)
+) FROM (SELECT * FROM automl_output_summary) summary;
+
+-- Validate output info table for metrics_iters NOT NULL
+SELECT assert(
+    metrics_iters = ARRAY[1,2,3,4,5,6], 'Invalid metrics_iters value in output info table. Actual:' || __to_char(info)
+) FROM (SELECT * FROM automl_output_info) info;
+
+-- Validate mst summary table
+SELECT assert(
+    model_arch_table = 'iris_model_arch' AND
+    object_table IS NULL , 'mst summary table validation failed. Actual:' || __to_char(summary)
+) FROM (SELECT * FROM automl_mst_table_summary) summary;
+
+-- Validating the best model selected learns (training loss goes down and accuracy improves)
+-- TODO: Keep a look for flaky test
+SELECT assert(
+    training_loss[6]-training_loss[1] < 10e-4 AND
+    training_metrics[6]-training_metrics[1] > 0,
+    'The loss and accuracy should have improved with more iterations.'
+)
+FROM automl_output_info
+WHERE mst_key = (SELECT mst_key from automl_mst_table);
+
+-- algorithm = tpe
+DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
+automl_mst_table_summary;
+SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+                           ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+    'hyperopt', 'num_configs=4, num_iterations=1, algorithm=tpe');
+
+SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
+SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
+SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
+SELECT assert(COUNT(*)=4, 'The length of table does not match with the inputs') FROM automl_output;
+SELECT assert(COUNT(*)=4, 'The length of table does not match with the inputs') FROM automl_output_info;
 
 -- test invalid source table
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
@@ -51,241 +112,345 @@ DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, a
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('invalid_source_table', 'automl_output', 'iris_model_arch', 'automl_mst_table',
         ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
         'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+        'hyperopt', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid source table');
 
 -- test preexisting output table
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+CREATE TABLE automl_output(a int);
 
 DROP TABLE IF EXISTS automl_mst_table, automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
         ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
         'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+        'hyperopt', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for preexisting output table');
 
 -- test preexisting selection table
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
-
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary;
+CREATE TABLE automl_mst_table(a int);
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
         ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
         'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+        'hyperopt', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for preexisting selection table');
 
--- test test invalid model id
+-- test invalid model id
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
         ARRAY[2,-1], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
         'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+        'hyperopt', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid model id');
 
--- test invalid automl method
+-- test invalid distribution
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperbrand', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
-$TRAP$)=1, 'Should error out for invalid automl method');
+        ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperopt', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for preexisting selection table');
 
+-- test invalid automl method
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hb', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyper', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid automl method');
 
--- test invalid automl params {R, eta, skip_last}
+-- test invalid automl params for hyperopt: {num_configs, num_iterations, algorithm}
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
         ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
         'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-        'hyperband', 'R=2, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
-$TRAP$)=1, 'Should error out for invalid automl params');
-
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
-SELECT assert(trap_error($TRAP$
-    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=0, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+        'hyperopt', 'num_configs=-2, num_iterations=5, algorithm=rand', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid automl params');
 
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=1, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyperopt', 'num_configs=2, num_iterations=0, algorithm=tpe', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid automl params');
 
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=3, skip_last=3', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyperopt', 'num_configs=5, num_iterations=2, algorithm=random', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid automl params');
 
 -- test invalid object table
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
         ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
         'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-        'hyperband', 'R=9, eta=3, skip_last=0', NULL, 'invalid_object_table', FALSE, NULL, NULL, NULL, NULL);
+        'hyperopt', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, 'invalid_object_table', FALSE, NULL, NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid object table');
 
 -- test invalid validation table
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
 SELECT assert(trap_error($TRAP$
     SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
         ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
         'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, 'invalid_validation_table', NULL, NULL, NULL);
+        'hyperopt', 'num_configs=5, num_iterations=2, algorithm=tpe', NULL, NULL, FALSE, 'invalid_validation_table', NULL, NULL, NULL);
 $TRAP$)=1, 'Should error out for invalid validation table');
 
--- test automl_method val
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+-- test config reproducibility
+DROP TABLE IF EXISTS automl_output1, automl_output1_info, automl_output1_summary, automl_mst_table1,
+    automl_mst_table1_summary;
+SELECT madlib_keras_automl('iris_data_packed', 'automl_output1', 'iris_model_arch', 'automl_mst_table1',
+                           ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyper', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+                           'hyperopt', 'num_configs=4, num_iterations=2, algorithm=tpe', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
 
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=15, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=15, 'The length of table does not match with the inputs') FROM automl_output_info;
+DROP TABLE IF EXISTS automl_output2, automl_output2_info, automl_output2_summary, automl_mst_table2,
+    automl_mst_table2_summary;
+SELECT madlib_keras_automl('iris_data_packed', 'automl_output2', 'iris_model_arch', 'automl_mst_table2',
+                           ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+                           'hyperopt', 'num_configs=4, num_iterations=2, algorithm=tpe', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
+
+DROP TABLE IF EXISTS automl_output3, automl_output3_info, automl_output3_summary, automl_mst_table3,
+    automl_mst_table3_summary;
+SELECT madlib_keras_automl('iris_data_packed', 'automl_output3', 'iris_model_arch', 'automl_mst_table3',
+                           ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'linear']}, {'optimizer': ['Adam', 'SGD'],
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+                           'hyperopt', 'num_configs=4, num_iterations=2, algorithm=tpe', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
+
+SELECT assert(model_id=(SELECT model_id FROM automl_output2_info WHERE mst_key=3) AND
+              compile_params=(SELECT compile_params FROM automl_output2_info WHERE mst_key=3) AND
+              fit_params=(SELECT fit_params FROM automl_output2_info WHERE mst_key=3), 'invalid config uniformity')
+FROM (SELECT model_id, compile_params, fit_params FROM automl_output1_info WHERE mst_key=3) output1;
+SELECT assert(model_id=(SELECT model_id FROM automl_output2_info WHERE mst_key=3) AND
+              compile_params=(SELECT compile_params FROM automl_output2_info WHERE mst_key=3) AND
+              fit_params=(SELECT fit_params FROM automl_output2_info WHERE mst_key=3), 'invalid config uniformity')
+FROM (SELECT model_id, compile_params, fit_params FROM automl_output3_info WHERE mst_key=3) output3;
+
+-- Test for metrics_elapsed_time for 2 configs per trial (total 3 trials)
+-- Setup for distributing data only on 2 segments
+DROP TABLE IF EXISTS segments_to_use;
+CREATE TABLE segments_to_use(
+    dbid INTEGER,
+    hostname TEXT
+);
+INSERT INTO segments_to_use SELECT dbid, hostname
+	FROM gp_segment_configuration
+	WHERE content>=0 AND preferred_role='p' limit 2;
+
+DROP TABLE IF EXISTS iris_data_2seg_packed, iris_data_2seg_packed_summary;
+SELECT training_preprocessor_dl('iris_data',         -- Source table
+                                'iris_data_2seg_packed',  -- Output table
+                                'class_text',         -- Dependent variable
+                                'attributes',         -- Independent variable
+								NULL, 255, NULL,
+								'segments_to_use'     -- Distribution rules
+                                );
 
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
+automl_mst_table_summary;
+SELECT madlib_keras_automl('iris_data_2seg_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+                           ARRAY[1], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [{'optimizer': ['Adam', 'SGD'],
+    'lr': [0.01, 0.011, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [50], 'epochs': [1]}$$,
+    'hyperopt', 'num_configs=5, num_iterations=2, algorithm=rand', NULL, NULL, FALSE, NULL, 1);
+
+SELECT assert(
+	t1.metrics_elapsed_time[2] < t2.metrics_elapsed_time[1] AND
+	t2.metrics_elapsed_time[2] < t3.metrics_elapsed_time[1] ,
+	'metrics_elapsed_time should be cumulative for each trial.'
+) FROM (SELECT * FROM automl_output_info WHERE mst_key=1) t1,
+ (SELECT * FROM automl_output_info WHERE mst_key=3) t2,
+ (SELECT * FROM automl_output_info WHERE mst_key=5) t3;
+
+--------------------------- HYPERBAND TEST CASES ---------------------------
+
+-- test table dimensions / happy path with default automl_params
+DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyp', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$);
 
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=15, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=15, 'The length of table does not match with the inputs') FROM automl_output_info;
+SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output;
+SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output_info;
 
--- test automl_params vals {R, eta, skip_last}
+-- Validate model output summary table
+SELECT assert(
+    source_table = 'iris_data_packed' AND
+    validation_table IS NULL AND
+    model = 'automl_output' AND
+    model_info = 'automl_output_info' AND
+    dependent_varname = 'class_text' AND
+    independent_varname = 'attributes' AND
+    model_arch_table = 'iris_model_arch' AND
+    model_selection_table = 'automl_mst_table' AND
+    automl_method = 'hyperband' AND
+    automl_params = 'R=6, eta=3, skip_last=0' AND
+    random_state IS NULL AND
+    object_table IS NULL AND
+    use_gpus = FALSE AND
+    metrics_compute_frequency = 6 AND
+    name IS NULL AND
+    description IS NULL AND
+    start_training_time < now() AND
+    end_training_time < now() AND
+    madlib_version IS NOT NULL AND
+    num_classes = 3 AND
+    class_values = '{Iris-setosa,Iris-versicolor,Iris-virginica}' AND
+    dependent_vartype = 'character varying' AND
+    normalizing_const = 1, 'Output summary table validation failed. Actual:' || __to_char(summary)
+) FROM (SELECT * FROM automl_output_summary) summary;
+
+-- Validate output info table for s and i NOT NULL
+SELECT assert(
+    metrics_iters IS NOT NULL AND
+    s = ANY(ARRAY[0,1]) AND
+    i = ANY(ARRAY[0,1]) , 'Invalid metrics_iters, s and i value in output info table. Actual:' || __to_char(info)
+) FROM (SELECT * FROM automl_output_info) info;
+
+-- Validate mst summary table
+SELECT assert(
+    model_arch_table = 'iris_model_arch' AND
+    object_table IS NULL , 'mst summary table validation failed. Actual:' || __to_char(summary)
+) FROM (SELECT * FROM automl_mst_table_summary) summary;
+
+-- test invalid source table
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=10, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('invalid_source_table', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+        ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for invalid source table');
 
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=15, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=15, 'The length of table does not match with the inputs') FROM automl_output_info;
+-- test preexisting output table
+CREATE TABLE automl_output(a int);
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+        ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for preexisting output table');
 
+-- test preexisting selection table
+CREATE TABLE automl_mst_table(a int);
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+        ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for preexisting selection table');
+
+-- test test invalid model id
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=5, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+        ARRAY[2,-1], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for invalid model id');
 
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output_info;
+-- test invalid automl params {R, eta, skip_last}
+DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
+    automl_mst_table_summary;
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+        ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperband', 'R=2, eta=3, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for invalid automl params: R');
 
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=10, eta=4, skip_last=1', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
-
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=4, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=4, 'The length of table does not match with the inputs') FROM automl_output_info;
+    'hyperband', 'R=9, eta=1, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for invalid automl params: eta');
 
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=5, eta=5, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyperband', 'R=9, eta=3, skip_last=3', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for invalid automl params: skip_last');
 
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
-SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=7, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=7, 'The length of table does not match with the inputs') FROM automl_output_info;
+-- test invalid object table
+DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
+    automl_mst_table_summary;
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+        ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperband', 'R=9, eta=3, skip_last=0', NULL, 'invalid_object_table', FALSE, NULL, NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for invalid object table');
 
+-- test invalid validation table
+DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
+    automl_mst_table_summary;
+SELECT assert(trap_error($TRAP$
+    SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
+        ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
+        'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
+        'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
+        'hyperband', 'R=9, eta=3, skip_last=0', NULL, NULL, FALSE, 'invalid_validation_table', NULL, NULL, NULL);
+$TRAP$)=1, 'Should error out for invalid validation table');
+
+-- test automl_params vals {R, eta, skip_last}
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=3, skip_last=2', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyperband', 'R=5, eta=5, skip_last=1', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=9, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=9, 'The length of table does not match with the inputs') FROM automl_output_info;
+SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output;
+SELECT assert(COUNT(*)=5, 'The length of table does not match with the inputs') FROM automl_output_info;
 
 DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
     automl_mst_table_summary;
@@ -293,24 +458,13 @@ SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=11, eta=2, skip_last=3', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyperband', 'R=5, eta=5, skip_last=0', NULL, NULL, FALSE, NULL, NULL, NULL, NULL);
 
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_mst_table_summary;
 SELECT assert(COUNT(*)=1, 'The length of table does not match with the inputs') FROM automl_output_summary;
-SELECT assert(COUNT(*)=8, 'The length of table does not match with the inputs') FROM automl_output;
-SELECT assert(COUNT(*)=8, 'The length of table does not match with the inputs') FROM automl_output_info;
-
--- test name and description
-DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table,
-    automl_mst_table_summary;
-SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
-    ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
-    'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=11, eta=2, skip_last=3', NULL, NULL, FALSE, NULL, NULL, 'test1', 'test1 descr');
-SELECT assert(name='test1' AND description='test1 descr',
-    'invalid name/description') FROM (SELECT * FROM automl_output_summary) summary;
+SELECT assert(COUNT(*)=7, 'The length of table does not match with the inputs') FROM automl_output;
+SELECT assert(COUNT(*)=7, 'The length of table does not match with the inputs') FROM automl_output_info;
 
 -- test config reproducibility
 DROP TABLE IF EXISTS automl_output1, automl_output1_info, automl_output1_summary, automl_mst_table1,
@@ -319,7 +473,7 @@ SELECT madlib_keras_automl('iris_data_packed', 'automl_output1', 'iris_model_arc
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=3, skip_last=1', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyperband', 'R=3, eta=2, skip_last=1', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
 
 DROP TABLE IF EXISTS automl_output2, automl_output2_info, automl_output2_summary, automl_mst_table2,
     automl_mst_table2_summary;
@@ -327,7 +481,7 @@ SELECT madlib_keras_automl('iris_data_packed', 'automl_output2', 'iris_model_arc
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=3, skip_last=1', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
+    'hyperband', 'R=3, eta=2, skip_last=1', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
 
 DROP TABLE IF EXISTS automl_output3, automl_output3_info, automl_output3_summary, automl_mst_table3,
     automl_mst_table3_summary;
@@ -335,16 +489,21 @@ SELECT madlib_keras_automl('iris_data_packed', 'automl_output3', 'iris_model_arc
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
     'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$,
-    'hyperband', 'R=9, eta=3, skip_last=1', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
-
-SELECT assert(model_id=(SELECT model_id FROM automl_output2_info WHERE mst_key=7) AND
-              compile_params=(SELECT compile_params FROM automl_output2_info WHERE mst_key=7) AND
-              fit_params=(SELECT fit_params FROM automl_output2_info WHERE mst_key=7), 'invalid config uniformity')
-FROM (SELECT model_id, compile_params, fit_params FROM automl_output1_info WHERE mst_key=7) output1;
-SELECT assert(model_id=(SELECT model_id FROM automl_output2_info WHERE mst_key=7) AND
-              compile_params=(SELECT compile_params FROM automl_output2_info WHERE mst_key=7) AND
-              fit_params=(SELECT fit_params FROM automl_output2_info WHERE mst_key=7), 'invalid config uniformity')
-FROM (SELECT model_id, compile_params, fit_params FROM automl_output3_info WHERE mst_key=7) output3;
+    'hyperband', 'R=3, eta=2, skip_last=1', 42, NULL, FALSE, NULL, NULL, NULL, NULL);
+
+SELECT assert(model_id=(SELECT model_id FROM automl_output2_info WHERE mst_key=2) AND
+              compile_params=(SELECT compile_params FROM automl_output2_info WHERE mst_key=2) AND
+              fit_params=(SELECT fit_params FROM automl_output2_info WHERE mst_key=2), 'invalid config uniformity')
+FROM (SELECT model_id, compile_params, fit_params FROM automl_output1_info WHERE mst_key=2) output1;
+SELECT assert(model_id=(SELECT model_id FROM automl_output2_info WHERE mst_key=2) AND
+              compile_params=(SELECT compile_params FROM automl_output2_info WHERE mst_key=2) AND
+              fit_params=(SELECT fit_params FROM automl_output2_info WHERE mst_key=2), 'invalid config uniformity')
+FROM (SELECT model_id, compile_params, fit_params FROM automl_output3_info WHERE mst_key=2) output3;
+
+DROP TABLE IF EXISTS automl_output1, automl_output1_info, automl_output1_summary, automl_mst_table1,
+    automl_mst_table1_summary, automl_output2, automl_output2_info, automl_output2_summary, automl_mst_table2,
+    automl_mst_table2_summary, automl_output3, automl_output3_info, automl_output3_summary, automl_mst_table3,
+    automl_mst_table3_summary;
 
 --------------------------- HYPERBAND SCHEDULE TEST CASES ---------------------------
 -- Testing happy path with default values
diff --git a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
index 15b3851..edb12c4 100644
--- a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
+++ b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
@@ -206,5 +206,79 @@ class HyperbandScheduleTestCase(unittest.TestCase):
     def tearDown(self):
         self.module_patcher.stop()
 
+
+
+class AutoMLHyperoptTestCase(unittest.TestCase):
+    def setUp(self):
+        # The side effects of this class(writing to the output table) are not
+        # tested here. They are tested in dev-check.
+        self.plpy_mock = Mock(spec='error')
+        patches = {
+            'plpy': plpy
+        }
+
+        self.plpy_mock_execute = MagicMock()
+        plpy.execute = self.plpy_mock_execute
+
+        self.module_patcher = patch.dict('sys.modules', patches)
+        self.module_patcher.start()
+        import deep_learning.madlib_keras_automl
+        self.module = deep_learning.madlib_keras_automl
+
+        from deep_learning.madlib_keras_automl import AutoMLHyperopt
+        self.seg_num_mock = Mock()
+
+        class FakeAutoMLHyperopt(AutoMLHyperopt):
+            def __init__(self, *args):
+                pass
+            self.module.get_seg_number = self.seg_num_mock
+
+        self.subject = FakeAutoMLHyperopt
+
+    def test_get_configs_list_models_less_than_segments(self):
+        automl_hyperopt = self.subject()
+        configs = automl_hyperopt.get_configs_list(1,3)
+        self.assertEquals([(1,1)], configs)
+
+    def test_get_configs_list_models_equal_segments(self):
+        automl_hyperopt = self.subject()
+        configs = automl_hyperopt.get_configs_list(3,3)
+        self.assertEquals([(1,3)], configs)
+
+    def test_get_configs_list_last_bucket_models_less_than_half_segments(self):
+        automl_hyperopt = self.subject()
+        # Last bucket num models < 1/2 num workers
+        configs = automl_hyperopt.get_configs_list(81,20)
+        self.assertEquals([(1, 20), (21, 40), (41, 60), (61, 81)], configs)
+
+    def test_get_configs_list_last_bucket_models_greater_than_half_segments(self):
+        automl_hyperopt = self.subject()
+        # Last bucket num models > 1/2 num workers
+        configs = automl_hyperopt.get_configs_list(20,3)
+        self.assertEquals([(1, 3), (4, 6), (7, 9), (10, 12), (13, 15), (16, 18),(19, 20)], configs)
+
+    def test_get_configs_list_last_bucket_models_equal_half_segments(self):
+        automl_hyperopt = self.subject()
+        # Last bucket num models = 1/2 num workers
+        configs = automl_hyperopt.get_configs_list(90,20)
+        self.assertEquals([(1, 20), (21, 40), (41, 60), (61, 80),(81,90)], configs)
+
+    def test_get_num_segments_all_segments(self):
+        automl_hyperopt = self.subject()
+        automl_hyperopt.source_table = 'dummy_table'
+        self.plpy_mock_execute.return_value = [{'distribution_rules': 'all_segments'}]
+        self.seg_num_mock.return_value = 3
+        self.assertEquals(3, automl_hyperopt.get_num_segments())
+
+    def test_get_num_segments_array_value(self):
+        automl_hyperopt = self.subject()
+        automl_hyperopt.source_table = 'dummy_table'
+        # return list of segment ids as distribution_rules
+        self.plpy_mock_execute.return_value = [{'distribution_rules': [3,1]}]
+        self.assertEquals(2, automl_hyperopt.get_num_segments())
+
+    def tearDown(self):
+        self.module_patcher.stop()
+
 if __name__ == '__main__':
     unittest.main()
diff --git a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_model_selection_table.py_in b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_model_selection_table.py_in
index 1a2f61f..7de9868 100644
--- a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_model_selection_table.py_in
+++ b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_model_selection_table.py_in
@@ -38,7 +38,8 @@ class GenerateModelSelectionConfigsTestCase(unittest.TestCase):
         # tested here. They are tested in dev-check.
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -354,7 +355,8 @@ class LoadModelSelectionTableTestCase(unittest.TestCase):
         # tested here. They are tested in dev-check.
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -478,7 +480,8 @@ class MstLoaderInputValidatorTestCase(unittest.TestCase):
         # tested here. They are tested in dev-check.
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
diff --git a/src/ports/postgres/modules/utilities/utilities.py_in b/src/ports/postgres/modules/utilities/utilities.py_in
index b228d68..eb507d9 100644
--- a/src/ports/postgres/modules/utilities/utilities.py_in
+++ b/src/ports/postgres/modules/utilities/utilities.py_in
@@ -1,9 +1,11 @@
 
 import collections
+from datetime import datetime
 import re
 import time
 import random
 from distutils.util import strtobool
+# import numpy as np
 
 from validate_args import _get_table_schema_names
 from validate_args import cols_in_tbl_valid
@@ -1317,3 +1319,8 @@ def get_schema(tbl_str):
 
     else:
         return None
+# -------------------------------------------------------------------------------
+
+def get_current_timestamp(format):
+    """Gets current time stamp in the specified format string"""
+    return datetime.fromtimestamp(time.time()).strftime(format)

[madlib] 04/08: DL: [AutoML] Add new class for Distribution rules

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 2d6e599bf9ab393c0e8c6b6a81b781a1a4e1088c
Author: Ekta Khanna <ek...@vmware.com>
AuthorDate: Fri Oct 9 17:32:26 2020 -0700

    DL: [AutoML] Add new class for Distribution rules
    
    JIRA: MADLIB-1453
    
    Co-authored-by: Nikhil Kak <nk...@vmware.com>
---
 .../deep_learning/input_data_preprocessor.py_in    | 17 ++++++++-------
 .../deep_learning/madlib_keras_automl.py_in        |  6 +++---
 .../deep_learning/madlib_keras_helper.py_in        |  3 +--
 .../deep_learning/madlib_keras_validator.py_in     |  1 +
 .../test/unit_tests/test_madlib_keras.py_in        | 24 ++++++++++++++--------
 .../test/unit_tests/test_madlib_keras_automl.py_in |  8 ++++----
 6 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
index 1d395a6..4b27642 100644
--- a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
+++ b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
@@ -51,6 +51,9 @@ from madlib_keras_helper import *
 import time
 
 NUM_CLASSES_COLNAME = "num_classes"
+class DistributionRulesOptions:
+    ALL_SEGMENTS = 'all_segments'
+    GPU_SEGMENTS = 'gpu_segments'
 
 class InputDataPreprocessorDL(object):
     def __init__(self, schema_madlib, source_table, output_table,
@@ -64,12 +67,12 @@ class InputDataPreprocessorDL(object):
         self.buffer_size = buffer_size
         self.normalizing_const = normalizing_const
         self.num_classes = num_classes
-        self.distribution_rules = distribution_rules if distribution_rules else 'all_segments'
+        self.distribution_rules = distribution_rules.lower() if distribution_rules else DistributionRulesOptions.ALL_SEGMENTS
         self.module_name = module_name
         self.output_summary_table = None
         self.dependent_vartype = None
         self.independent_vartype = None
-        self.gpu_config = '$__madlib__$all_segments$__madlib__$'
+        self.gpu_config = '$__madlib__${0}$__madlib__$'.format(DistributionRulesOptions.ALL_SEGMENTS)
         if self.output_table:
             self.output_summary_table = add_postfix(self.output_table, "_summary")
 
@@ -269,7 +272,7 @@ class InputDataPreprocessorDL(object):
 
         if is_platform_pg():
             # used later for writing summary table
-            self.distribution_rules = '$__madlib__$all_segments$__madlib__$'
+            self.distribution_rules = '$__madlib__${0}$__madlib__$'.format(DistributionRulesOptions.ALL_SEGMENTS)
 
             #
             # For postgres, we just need 3 simple queries:
@@ -320,14 +323,14 @@ class InputDataPreprocessorDL(object):
         #   it's to be spread evenly across all segments, we still
         #   need to do some extra work to ensure that happens.
 
-        if self.distribution_rules == 'all_segments':
+        if self.distribution_rules == DistributionRulesOptions.ALL_SEGMENTS:
             all_segments = True
-            self.distribution_rules = '$__madlib__$all_segments$__madlib__$'
+            self.distribution_rules = '$__madlib__${0}$__madlib__$'.format(DistributionRulesOptions.ALL_SEGMENTS)
             num_segments = get_seg_number()
         else:
             all_segments = False
 
-        if self.distribution_rules == 'gpu_segments':
+        if self.distribution_rules == DistributionRulesOptions.GPU_SEGMENTS:
             #TODO can we reuse the function `get_accessible_gpus_for_seg` from
             # madlib_keras_helper
             gpu_info_table = unique_string(desp='gpu_info')
@@ -620,7 +623,7 @@ class InputDataPreprocessorDL(object):
                        normalizing_const_colname=NORMALIZING_CONST_COLNAME,
                        num_classes_colname=NUM_CLASSES_COLNAME,
                        internal_gpu_config=INTERNAL_GPU_CONFIG,
-                       distribution_rules=DISTRIBUTION_RULES,
+                       distribution_rules=DISTRIBUTION_RULES_COLNAME,
                        FLOAT32_SQL_TYPE=FLOAT32_SQL_TYPE)
         plpy.execute(query)
 
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
index 0df6772..dc8c837 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
@@ -33,7 +33,7 @@ from utilities.utilities import get_current_timestamp, get_seg_number, get_segme
 from utilities.control import SetGUC
 from madlib_keras_fit_multiple_model import FitMultipleModel
 from madlib_keras_helper import generate_row_string
-from madlib_keras_helper import DISTRIBUTION_RULES
+from madlib_keras_helper import DISTRIBUTION_RULES_COLNAME
 from madlib_keras_model_selection import MstSearch, ModelSelectionSchema
 from keras_model_arch_table import ModelArchSchema
 from utilities.validate_args import table_exists, drop_tables, input_tbl_valid
@@ -706,7 +706,7 @@ class AutoMLHyperopt(KerasAutoML):
         :return:
         """
         source_summary_table = add_postfix(self.source_table, '_summary')
-        dist_rules = plpy.execute("SELECT {0} from {1}".format(DISTRIBUTION_RULES, source_summary_table))[0][DISTRIBUTION_RULES]
+        dist_rules = plpy.execute("SELECT {0} from {1}".format(DISTRIBUTION_RULES_COLNAME, source_summary_table))[0][DISTRIBUTION_RULES_COLNAME]
         #TODO create constant for all_segments
         if dist_rules == "all_segments":
             return get_seg_number()
@@ -734,9 +734,9 @@ class AutoMLHyperopt(KerasAutoML):
                     self.algorithm = rand
                 elif automl_params_dict[AutoMLConstants.ALGORITHM].lower() == 'tpe':
                     self.algorithm = tpe
+                # TODO: Add support for atpe uncomment the below lines after atpe works
                 # elif automl_params_dict[AutoMLSchema.ALGORITHM].lower() == 'atpe':
                 #     self.algorithm = atpe
-                # uncomment the above lines after atpe works # TODO
                 else:
                     plpy.error("{0}: valid algorithm 'automl_params' for hyperopt: 'rand', 'tpe'".format(self.module_name)) # , or 'atpe'
             else:
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
index 96c2817..be9a1f9 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
@@ -26,7 +26,6 @@ from utilities.validate_args import table_exists
 from madlib_keras_gpu_info import GPUInfoFunctions
 import plpy
 from math import isnan
-# from madlib_keras_model_selection import ModelSelectionSchema
 
 ############### Constants used in other deep learning files #########
 # Name of columns in model summary table.
@@ -54,7 +53,7 @@ SMALLINT_SQL_TYPE = 'SMALLINT'
 DEFAULT_NORMALIZING_CONST = 1.0
 GP_SEGMENT_ID_COLNAME = "gp_segment_id"
 INTERNAL_GPU_CONFIG = '__internal_gpu_config__'
-DISTRIBUTION_RULES = "distribution_rules"
+DISTRIBUTION_RULES_COLNAME = "distribution_rules"
 #####################################################################
 
 # Prepend a dimension to np arrays using expand_dims.
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in
index 8b2157d..41e4c72 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_validator.py_in
@@ -18,6 +18,7 @@
 # under the License.
 
 import plpy
+from input_data_preprocessor import DistributionRulesOptions
 from keras_model_arch_table import ModelArchSchema
 from model_arch_info import get_num_classes
 from madlib_keras_custom_function import CustomFunctionSchema
diff --git a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
index e69bab4..13bbfd1 100644
--- a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
+++ b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
@@ -51,7 +51,8 @@ class MadlibKerasFitTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -691,7 +692,8 @@ class InternalKerasPredictTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -795,7 +797,8 @@ class MadlibKerasPredictBYOMTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -877,7 +880,8 @@ class MadlibKerasWrapperTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -1210,7 +1214,8 @@ class MadlibKerasFitCommonValidatorTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -1262,7 +1267,8 @@ class InputValidatorTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -1382,7 +1388,8 @@ class MadlibSerializerTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -1585,7 +1592,8 @@ class MadlibKerasEvaluationTestCase(unittest.TestCase):
     def setUp(self):
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
diff --git a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
index edb12c4..946dde3 100644
--- a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
+++ b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
@@ -37,7 +37,8 @@ class HyperbandScheduleTestCase(unittest.TestCase):
         # tested here. They are tested in dev-check.
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()
@@ -206,15 +207,14 @@ class HyperbandScheduleTestCase(unittest.TestCase):
     def tearDown(self):
         self.module_patcher.stop()
 
-
-
 class AutoMLHyperoptTestCase(unittest.TestCase):
     def setUp(self):
         # The side effects of this class(writing to the output table) are not
         # tested here. They are tested in dev-check.
         self.plpy_mock = Mock(spec='error')
         patches = {
-            'plpy': plpy
+            'plpy': plpy,
+            'utilities.mean_std_dev_calculator': Mock()
         }
 
         self.plpy_mock_execute = MagicMock()

[madlib] 05/08: DL: [AutoML] Split automl methods to their own files

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit c849dd08a16ee03114dbe75ee65781dd7264c925
Author: Ekta Khanna <ek...@vmware.com>
AuthorDate: Thu Oct 15 17:42:11 2020 -0700

    DL: [AutoML] Split automl methods to their own files
    
    JIRA: MADLIB-1453
    
    This commit also set plan_cache_mode when calling fit multiple model
    from the automl methods.
    
    Co-authored-by: Nikhil Kak <nk...@vmware.com>
---
 .../deep_learning/madlib_keras_automl.py_in        | 831 +--------------------
 .../deep_learning/madlib_keras_automl.sql_in       |  20 +-
 .../madlib_keras_automl_hyperband.py_in            | 419 +++++++++++
 .../madlib_keras_automl_hyperopt.py_in             | 458 ++++++++++++
 .../test/unit_tests/test_madlib_keras_automl.py_in |  16 +-
 5 files changed, 907 insertions(+), 837 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
index dc8c837..c795ee1 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
@@ -17,24 +17,13 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from ast import literal_eval
-from datetime import datetime
-from hyperopt import hp, rand, tpe, atpe, Trials, STATUS_OK, STATUS_RUNNING
-from hyperopt.base import Domain
-import math
-import numpy as np
 import plpy
-import time
 
 from madlib_keras_validator import MstLoaderInputValidator
-# from utilities.admin import cleanup_madlib_temp_tables
 from utilities.utilities import get_current_timestamp, get_seg_number, get_segments_per_host, \
-    unique_string, add_postfix, extract_keyvalue_params, _assert, _assert_equal, rename_table
-from utilities.control import SetGUC
-from madlib_keras_fit_multiple_model import FitMultipleModel
-from madlib_keras_helper import generate_row_string
-from madlib_keras_helper import DISTRIBUTION_RULES_COLNAME
-from madlib_keras_model_selection import MstSearch, ModelSelectionSchema
+    unique_string, add_postfix, extract_keyvalue_params, _assert, _assert_equal, rename_table, \
+    is_platform_pg
+from madlib_keras_model_selection import ModelSelectionSchema
 from keras_model_arch_table import ModelArchSchema
 from utilities.validate_args import table_exists, drop_tables, input_tbl_valid
 from utilities.validate_args import quote_ident
@@ -49,6 +38,7 @@ class AutoMLConstants:
     R = 'R'
     ETA = 'eta'
     SKIP_LAST = 'skip_last'
+    HYPERBAND_PARAMS = [R, ETA, SKIP_LAST]
     LOSS_METRIC = 'training_loss_final'
     TEMP_MST_TABLE = unique_string('temp_mst_table')
     TEMP_MST_SUMMARY_TABLE = add_postfix(TEMP_MST_TABLE, '_summary')
@@ -57,119 +47,11 @@ class AutoMLConstants:
     NUM_CONFIGS = 'num_configs'
     NUM_ITERS = 'num_iterations'
     ALGORITHM = 'algorithm'
+    HYPEROPT_PARAMS = [NUM_CONFIGS, NUM_ITERS, ALGORITHM]
     TIME_FORMAT = '%Y-%m-%d %H:%M:%S'
     INT_MAX = 2 ** 31 - 1
     TARGET_SCHEMA = 'public'
 
-class HyperbandSchedule():
-    """The utility class for loading a hyperband schedule table with algorithm inputs.
-
-    Attributes:
-        schedule_table (string): Name of output table containing hyperband schedule.
-        R (int): Maximum number of resources (iterations) that can be allocated
-  to a single configuration.
-        eta (int): Controls the proportion of configurations discarded in
-  each round of successive halving.
-        skip_last (int): The number of last rounds to skip.
-    """
-    def __init__(self, schedule_table, R, eta=3, skip_last=0):
-        self.schedule_table = schedule_table # table name to store hyperband schedule
-        self.R = R # maximum iterations/epochs allocated to a configuration
-        self.eta = eta # defines downsampling rate
-        self.skip_last = skip_last
-        self.module_name = 'hyperband_schedule'
-        self.validate_inputs()
-
-        # number of unique executions of Successive Halving (minus one)
-        self.s_max = int(math.floor(math.log(self.R, self.eta)))
-        self.validate_s_max()
-
-        self.schedule_vals = []
-
-        self.calculate_schedule()
-
-    def load(self):
-        """
-        The entry point for loading the hyperband schedule table.
-        """
-        self.create_schedule_table()
-        self.insert_into_schedule_table()
-
-    def validate_inputs(self):
-        """
-        Validates user input values
-        """
-        _assert(self.eta > 1, "{0}: eta must be greater than 1".format(self.module_name))
-        _assert(self.R >= self.eta, "{0}: R should not be less than eta".format(self.module_name))
-
-    def validate_s_max(self):
-        _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "{0}: skip_last must be " +
-                "non-negative and less than {1}".format(self.module_name,self.s_max))
-
-    def calculate_schedule(self):
-        """
-        Calculates the hyperband schedule (number of configs and allocated resources)
-        in each round of each bracket and skips the number of last rounds specified in 'skip_last'
-        """
-        for s in reversed(range(self.s_max+1)):
-            n = int(math.ceil(int((self.s_max+1)/(s+1))*math.pow(self.eta, s))) # initial number of configurations
-            r = self.R * math.pow(self.eta, -s)
-
-            for i in range((s+1) - int(self.skip_last)):
-                # Computing each of the
-                n_i = n*math.pow(self.eta, -i)
-                r_i = r*math.pow(self.eta, i)
-
-                self.schedule_vals.append({AutoMLConstants.BRACKET: s,
-                                           AutoMLConstants.ROUND: i,
-                                           AutoMLConstants.CONFIGURATIONS: int(n_i),
-                                           AutoMLConstants.RESOURCES: int(round(r_i))})
-
-    def create_schedule_table(self):
-        """Initializes the output schedule table"""
-        create_query = """
-                        CREATE TABLE {self.schedule_table} (
-                            {s} INTEGER,
-                            {i} INTEGER,
-                            {n_i} INTEGER,
-                            {r_i} INTEGER,
-                            unique ({s}, {i})
-                        );
-                       """.format(self=self,
-                                  s=AutoMLConstants.BRACKET,
-                                  i=AutoMLConstants.ROUND,
-                                  n_i=AutoMLConstants.CONFIGURATIONS,
-                                  r_i=AutoMLConstants.RESOURCES)
-        plpy.execute(create_query)
-
-    def insert_into_schedule_table(self):
-        """Insert everything in self.schedule_vals into the output schedule table."""
-        for sd in self.schedule_vals:
-            sd_s = sd[AutoMLConstants.BRACKET]
-            sd_i = sd[AutoMLConstants.ROUND]
-            sd_n_i = sd[AutoMLConstants.CONFIGURATIONS]
-            sd_r_i = sd[AutoMLConstants.RESOURCES]
-            insert_query = """
-                            INSERT INTO
-                                {self.schedule_table}(
-                                    {s_col},
-                                    {i_col},
-                                    {n_i_col},
-                                    {r_i_col}
-                                )
-                            VALUES (
-                                {sd_s},
-                                {sd_i},
-                                {sd_n_i},
-                                {sd_r_i}
-                            )
-                           """.format(s_col=AutoMLConstants.BRACKET,
-                                      i_col=AutoMLConstants.ROUND,
-                                      n_i_col=AutoMLConstants.CONFIGURATIONS,
-                                      r_i_col=AutoMLConstants.RESOURCES,
-                                      **locals())
-            plpy.execute(insert_query)
-
 class KerasAutoML(object):
     """
     The core AutoML class for running AutoML algorithms such as Hyperband and Hyperopt.
@@ -179,6 +61,9 @@ class KerasAutoML(object):
                  automl_params=None, random_state=None, object_table=None,
                  use_gpus=False, validation_table=None, metrics_compute_frequency=None,
                  name=None, description=None, **kwargs):
+        if is_platform_pg():
+            plpy.error(
+                "DL: AutoML is not supported on PostgreSQL.")
         self.schema_madlib = schema_madlib
         self.source_table = source_table
         self.model_output_table = model_output_table
@@ -381,704 +266,8 @@ class KerasAutoML(object):
         Remove all intermediate tables created for AutoML runs/updates.
         :param model_training: Fit Multiple function call object.
         """
+        if not model_training:
+            return
         drop_tables([model_training.original_model_output_table, model_training.model_info_table,
                      model_training.model_summary_table, AutoMLConstants.TEMP_MST_TABLE,
                      AutoMLConstants.TEMP_MST_SUMMARY_TABLE])
-
-class AutoMLHyperband(KerasAutoML):
-    """
-    This class implements Hyperband, an infinite-arm bandit based algorithm that speeds up random search
-    through adaptive resource allocation, successive halving (SHA), and early stopping.
-
-    This class showcases a novel hyperband implementation by executing the hyperband rounds 'diagonally'
-    to evaluate multiple configurations together and leverage the compute power of MPP databases such as Greenplum.
-
-    This automl method inherits qualities from the automl class.
-    """
-    def __init__(self, schema_madlib, source_table, model_output_table, model_arch_table, model_selection_table,
-                 model_id_list, compile_params_grid, fit_params_grid, automl_method,
-                 automl_params, random_state=None, object_table=None,
-                 use_gpus=False, validation_table=None, metrics_compute_frequency=None,
-                 name=None, description=None, **kwargs):
-        automl_method = automl_method if automl_method else AutoMLConstants.HYPERBAND
-        automl_params = automl_params if automl_params else 'R=6, eta=3, skip_last=0'
-        KerasAutoML.__init__(self, schema_madlib, source_table, model_output_table, model_arch_table,
-                             model_selection_table, model_id_list, compile_params_grid, fit_params_grid,
-                             automl_method, automl_params, random_state, object_table, use_gpus,
-                             validation_table, metrics_compute_frequency, name, description, **kwargs)
-        self.validate_and_define_inputs()
-        self.create_model_output_table()
-        self.create_model_output_info_table()
-        self.find_hyperband_config()
-
-    def validate_and_define_inputs(self):
-        automl_params_dict = extract_keyvalue_params(self.automl_params,
-                                                     lower_case_names=False)
-        # casting dict values to int
-        for i in automl_params_dict:
-            automl_params_dict[i] = int(automl_params_dict[i])
-        _assert(len(automl_params_dict) >= 1 and len(automl_params_dict) <= 3,
-                "{0}: Only R, eta, and skip_last may be specified".format(self.module_name))
-        for i in automl_params_dict:
-            if i == AutoMLConstants.R:
-                self.R = automl_params_dict[AutoMLConstants.R]
-            elif i == AutoMLConstants.ETA:
-                self.eta = automl_params_dict[AutoMLConstants.ETA]
-            elif i == AutoMLConstants.SKIP_LAST:
-                self.skip_last = automl_params_dict[AutoMLConstants.SKIP_LAST]
-            else:
-                plpy.error("{0}: {1} is an invalid automl param".format(self.module_name, i))
-        _assert(self.eta > 1, "{0}: eta must be greater than 1".format(self.module_name))
-        _assert(self.R >= self.eta, "{0}: R should not be less than eta".format(self.module_name))
-        self.s_max = int(math.floor(math.log(self.R, self.eta)))
-        _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "{0}: skip_last must be " \
-                "non-negative and less than {1}".format(self.module_name, self.s_max))
-
-    def find_hyperband_config(self):
-        """
-        Executes the diagonal hyperband algorithm.
-        """
-        initial_vals = {}
-
-        # get hyper parameter configs for each s
-        for s in reversed(range(self.s_max+1)):
-            n = int(math.ceil(int((self.s_max+1)/(s+1))*math.pow(self.eta, s))) # initial number of configurations
-            r = self.R * math.pow(self.eta, -s) # initial number of iterations to run configurations for
-            initial_vals[s] = (n, int(round(r)))
-        self.start_training_time = self.get_current_timestamp()
-        self.start_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
-        random_search = MstSearch(self.schema_madlib,
-                                  self.model_arch_table,
-                                  self.model_selection_table,
-                                  self.model_id_list,
-                                  self.compile_params_grid,
-                                  self.fit_params_grid,
-                                  'random',
-                                  sum([initial_vals[k][0] for k in initial_vals][self.skip_last:]),
-                                  self.random_state,
-                                  self.object_table)
-        random_search.load() # for populating mst tables
-
-        # for creating the summary table for usage in fit multiple
-        plpy.execute("CREATE TABLE {AutoMLSchema.TEMP_MST_SUMMARY_TABLE} AS " \
-                     "SELECT * FROM {random_search.model_selection_summary_table}".format(AutoMLSchema=AutoMLConstants,
-                                                                                          random_search=random_search))
-        ranges_dict = self.mst_key_ranges_dict(initial_vals)
-        # to store the bracket and round numbers
-        s_dict, i_dict = {}, {}
-        for key, val in ranges_dict.items():
-            for mst_key in range(val[0], val[1]+1):
-                s_dict[mst_key] = key
-                i_dict[mst_key] = -1
-
-        # outer loop on diagonal
-        for i in range((self.s_max+1) - int(self.skip_last)):
-            # inner loop on s desc
-            temp_lst = []
-            configs_prune_lookup = {}
-            for s in range(self.s_max, self.s_max-i-1, -1):
-                n = initial_vals[s][0]
-                n_i = n * math.pow(self.eta, -i+self.s_max-s)
-                configs_prune_lookup[s] = int(round(n_i))
-                temp_lst.append("{0} configs under bracket={1} & round={2}".format(int(n_i), s, s-self.s_max+i))
-            num_iterations = int(initial_vals[self.s_max-i][1])
-            plpy.info('*** Diagonally evaluating ' + ', '.join(temp_lst) + ' with {0} iterations ***'.format(
-                num_iterations))
-
-            self.reconstruct_temp_mst_table(i, ranges_dict, configs_prune_lookup) # has keys to evaluate
-            active_keys = plpy.execute("SELECT {ModelSelectionSchema.MST_KEY} " \
-                                       "FROM {AutoMLSchema.TEMP_MST_TABLE}".format(AutoMLSchema=AutoMLConstants,
-                                                                                   ModelSelectionSchema=ModelSelectionSchema))
-            for k in active_keys:
-                i_dict[k[ModelSelectionSchema.MST_KEY]] += 1
-            self.warm_start = int(i != 0)
-            mcf = self.metrics_compute_frequency if self._is_valid_metrics_compute_frequency(num_iterations) else None
-            with SetGUC("plan_cache_mode", "force_generic_plan"):
-                model_training = FitMultipleModel(self.schema_madlib, self.source_table, AutoMLSchema.TEMP_OUTPUT_TABLE,
-                                              AutoMLSchema.TEMP_MST_TABLE, num_iterations, self.use_gpus,
-                                              self.validation_table, mcf, self.warm_start, self.name, self.description)
-            self.update_model_output_table(model_training)
-            self.update_model_output_info_table(i, model_training, initial_vals)
-
-            self.print_best_mst_so_far()
-
-        self.end_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
-        self.add_additional_info_cols(s_dict, i_dict)
-        self.update_model_selection_table()
-        self.generate_model_output_summary_table(model_training)
-        self.remove_temp_tables(model_training)
-        # cleanup_madlib_temp_tables(self.schema_madlib, AutoMLSchema.TARGET_SCHEMA)
-
-    def mst_key_ranges_dict(self, initial_vals):
-        """
-        Extracts the ranges of model configs (using mst_keys) belonging to / sampled as part of
-        executing a particular SHA bracket.
-        """
-        d = {}
-        for s_val in sorted(initial_vals.keys(), reverse=True): # going from s_max to 0
-            if s_val == self.s_max:
-                d[s_val] = (1, initial_vals[s_val][0])
-            else:
-                d[s_val] = (d[s_val+1][1]+1, d[s_val+1][1]+initial_vals[s_val][0])
-        return d
-
-    def reconstruct_temp_mst_table(self, i, ranges_dict, configs_prune_lookup):
-        """
-        Drops and Reconstructs a temp mst table for evaluation along particular diagonals of hyperband.
-        :param i: outer diagonal loop iteration.
-        :param ranges_dict: model config ranges to group by bracket number.
-        :param configs_prune_lookup: Lookup dictionary for configs to evaluate for a diagonal.
-        :return:
-        """
-        if i == 0:
-            _assert_equal(len(configs_prune_lookup), 1, "invalid args")
-            lower_bound, upper_bound = ranges_dict[self.s_max]
-            plpy.execute("CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} AS SELECT * FROM {self.model_selection_table} "
-                         "WHERE {ModelSelectionSchema.MST_KEY} >= {lower_bound} " \
-                         "AND {ModelSelectionSchema.MST_KEY} <= {upper_bound}".format(self=self,
-                                                                                      AutoMLSchema=AutoMLConstants,
-                                                                                      lower_bound=lower_bound,
-                                                                                      upper_bound=upper_bound,
-                                                                                      ModelSelectionSchema=ModelSelectionSchema))
-            return
-        # dropping and repopulating temp_mst_table
-        drop_tables([AutoMLConstants.TEMP_MST_TABLE])
-
-        # {mst_key} changed from SERIAL to INTEGER for safe insertions and preservation of mst_key values
-        create_query = """
-                        CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} (
-                            {mst_key} INTEGER,
-                            {model_id} INTEGER,
-                            {compile_params} VARCHAR,
-                            {fit_params} VARCHAR,
-                            unique ({model_id}, {compile_params}, {fit_params})
-                        );
-                       """.format(AutoMLSchema=AutoMLConstants,
-                                  mst_key=ModelSelectionSchema.MST_KEY,
-                                  model_id=ModelSelectionSchema.MODEL_ID,
-                                  compile_params=ModelSelectionSchema.COMPILE_PARAMS,
-                                  fit_params=ModelSelectionSchema.FIT_PARAMS)
-        plpy.execute(create_query)
-
-        query = ""
-        new_configs = True
-        for s_val in configs_prune_lookup:
-            lower_bound, upper_bound = ranges_dict[s_val]
-            if new_configs:
-                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT {ModelSelectionSchema.MST_KEY}, " \
-                         "{ModelSelectionSchema.MODEL_ID}, {ModelSelectionSchema.COMPILE_PARAMS}, " \
-                         "{ModelSelectionSchema.FIT_PARAMS} FROM {self.model_selection_table} WHERE " \
-                         "{ModelSelectionSchema.MST_KEY} >= {lower_bound} AND {ModelSelectionSchema.MST_KEY} <= " \
-                         "{upper_bound};".format(self=self, AutoMLSchema=AutoMLConstants,
-                                                 ModelSelectionSchema=ModelSelectionSchema,
-                                                 lower_bound=lower_bound, upper_bound=upper_bound)
-                new_configs = False
-            else:
-                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT {ModelSelectionSchema.MST_KEY}, " \
-                         "{ModelSelectionSchema.MODEL_ID}, {ModelSelectionSchema.COMPILE_PARAMS}, " \
-                         "{ModelSelectionSchema.FIT_PARAMS} " \
-                         "FROM {self.model_info_table} WHERE {ModelSelectionSchema.MST_KEY} >= {lower_bound} " \
-                         "AND {ModelSelectionSchema.MST_KEY} <= {upper_bound} ORDER BY {AutoMLSchema.LOSS_METRIC} " \
-                         "LIMIT {configs_prune_lookup_val};".format(self=self, AutoMLSchema=AutoMLConstants,
-                                                                    ModelSelectionSchema=ModelSelectionSchema,
-                                                                    lower_bound=lower_bound, upper_bound=upper_bound,
-                                                                    configs_prune_lookup_val=configs_prune_lookup[s_val])
-        plpy.execute(query)
-
-    def update_model_output_table(self, model_training):
-        """
-        Updates gathered information of a hyperband diagonal run to the overall model output table.
-        :param model_training: Fit Multiple function call object.
-        """
-        # updates model weights for any previously trained configs
-        plpy.execute("UPDATE {self.model_output_table} a SET model_weights=" \
-                     "t.model_weights FROM {model_training.original_model_output_table} t " \
-                     "WHERE a.mst_key=t.mst_key".format(self=self, model_training=model_training))
-
-        # truncate and re-creates table to avoid memory blow-ups
-        with SetGUC("dev_opt_unsafe_truncate_in_subtransaction", "on"):
-            temp_model_table = unique_string('updated_model')
-            plpy.execute("CREATE TABLE {temp_model_table} AS SELECT * FROM {self.model_output_table};" \
-                         "TRUNCATE {self.model_output_table}; " \
-                         "DROP TABLE {self.model_output_table};".format(temp_model_table=temp_model_table, self=self))
-            rename_table(self.schema_madlib, temp_model_table, self.model_output_table)
-
-        # inserts any newly trained configs
-        plpy.execute("INSERT INTO {self.model_output_table} SELECT * FROM {model_training.original_model_output_table} " \
-                     "WHERE {model_training.original_model_output_table}.mst_key NOT IN " \
-                     "(SELECT {ModelSelectionSchema.MST_KEY} FROM {self.model_output_table})".format(self=self,
-                                                                              model_training=model_training,
-                                                                              ModelSelectionSchema=ModelSelectionSchema))
-
-    def update_model_output_info_table(self, i, model_training, initial_vals):
-        """
-        Updates gathered information of a hyperband diagonal run to the overall model output info table.
-        :param i: outer diagonal loop iteration.
-        :param model_training: Fit Multiple function call object.
-        :param initial_vals: Dictionary of initial configurations and resources as part of the initial hyperband
-        schedule.
-        """
-        # normalizing factor for metrics_iters due to warm start
-        epochs_factor = sum([n[1] for n in initial_vals.values()][::-1][:i]) # i & initial_vals args needed
-        iters = plpy.execute("SELECT {AutoMLSchema.METRICS_ITERS} " \
-                             "FROM {model_training.model_summary_table}".format(AutoMLSchema=AutoMLConstants,
-                                                                                model_training=model_training))
-        metrics_iters_val = [epochs_factor+mi for mi in iters[0]['metrics_iters']] # global iteration counter
-
-        validation_update_q = "validation_metrics_final=t.validation_metrics_final, " \
-                                     "validation_loss_final=t.validation_loss_final, " \
-                                     "validation_metrics=a.validation_metrics || t.validation_metrics, " \
-                                     "validation_loss=a.validation_loss || t.validation_loss, " \
-            if self.validation_table else ""
-
-        # updates train/val info for any previously trained configs
-        plpy.execute("UPDATE {self.model_info_table} a SET " \
-                     "metrics_elapsed_time=a.metrics_elapsed_time || t.metrics_elapsed_time, " \
-                     "training_metrics_final=t.training_metrics_final, " \
-                     "training_loss_final=t.training_loss_final, " \
-                     "training_metrics=a.training_metrics || t.training_metrics, " \
-                     "training_loss=a.training_loss || t.training_loss, ".format(self=self) + validation_update_q +
-                     "{AutoMLSchema.METRICS_ITERS}=a.metrics_iters || ARRAY{metrics_iters_val}::INTEGER[] " \
-                     "FROM {model_training.model_info_table} t " \
-                     "WHERE a.mst_key=t.mst_key".format(model_training=model_training, AutoMLSchema=AutoMLConstants,
-                                                        metrics_iters_val=metrics_iters_val))
-
-        # inserts info about metrics and validation for newly trained model configs
-        plpy.execute("INSERT INTO {self.model_info_table} SELECT t.*, ARRAY{metrics_iters_val}::INTEGER[] AS metrics_iters " \
-                     "FROM {model_training.model_info_table} t WHERE t.mst_key NOT IN " \
-                     "(SELECT {ModelSelectionSchema.MST_KEY} FROM {self.model_info_table})".format(self=self,
-                                                                            model_training=model_training,
-                                                                            metrics_iters_val=metrics_iters_val,
-                                                                            ModelSelectionSchema=ModelSelectionSchema))
-
-    def add_additional_info_cols(self, s_dict, i_dict):
-        """Adds s and i columns to the info table"""
-
-        plpy.execute("ALTER TABLE {self.model_info_table} ADD COLUMN s int, ADD COLUMN i int;".format(self=self))
-
-        l = [(k, s_dict[k], i_dict[k]) for k in s_dict]
-        query = "UPDATE {self.model_info_table} t SET s=b.s_val, i=b.i_val FROM unnest(ARRAY{l}) " \
-                "b (key integer, s_val integer, i_val integer) WHERE t.mst_key=b.key".format(self=self, l=l)
-        plpy.execute(query)
-
-class AutoMLHyperopt(KerasAutoML):
-    """
-    This class implements Hyperopt, another automl method that explores awkward search spaces using
-    Random Search, Tree-structured Parzen Estimator (TPE), or Adaptive TPE.
-
-    This function executes hyperopt on top of our multiple model training infrastructure powered with
-    Model hOpper Parallelism (MOP), a hybrid of data and task parallelism.
-
-    This automl method inherits qualities from the automl class.
-    """
-    def __init__(self, schema_madlib, source_table, model_output_table, model_arch_table, model_selection_table,
-                 model_id_list, compile_params_grid, fit_params_grid, automl_method,
-                 automl_params, random_state=None, object_table=None,
-                 use_gpus=False, validation_table=None, metrics_compute_frequency=None,
-                 name=None, description=None, **kwargs):
-        automl_method = automl_method if automl_method else AutoMLConstants.HYPEROPT
-        automl_params = automl_params if automl_params else 'num_configs=20, num_iterations=5, algorithm=tpe'
-        KerasAutoML.__init__(self, schema_madlib, source_table, model_output_table, model_arch_table,
-                             model_selection_table, model_id_list, compile_params_grid, fit_params_grid,
-                             automl_method, automl_params, random_state, object_table, use_gpus,
-                             validation_table, metrics_compute_frequency, name, description, **kwargs)
-        self.compile_params_grid = self.compile_params_grid.replace('\n', '').replace(' ', '')
-        self.fit_params_grid = self.fit_params_grid.replace('\n', '').replace(' ', '')
-        try:
-            self.compile_params_grid = literal_eval(self.compile_params_grid)
-
-        except:
-            plpy.error("Invalid syntax in 'compile_params_dict'")
-        try:
-            self.fit_params_grid = literal_eval(self.fit_params_grid)
-        except:
-            plpy.error("Invalid syntax in 'fit_params_dict'")
-        self.validate_and_define_inputs()
-        self.num_segments = self.get_num_segments()
-
-        self.create_model_output_table()
-        self.create_model_output_info_table()
-        self.find_hyperopt_config()
-
-    def get_num_segments(self):
-        """
-        # query dist rules from summary table to get the total no of segments
-        :return:
-        """
-        source_summary_table = add_postfix(self.source_table, '_summary')
-        dist_rules = plpy.execute("SELECT {0} from {1}".format(DISTRIBUTION_RULES_COLNAME, source_summary_table))[0][DISTRIBUTION_RULES_COLNAME]
-        #TODO create constant for all_segments
-        if dist_rules == "all_segments":
-            return get_seg_number()
-
-        return len(dist_rules)
-
-    def validate_and_define_inputs(self):
-        automl_params_dict = extract_keyvalue_params(self.automl_params,
-                                                     lower_case_names=True)
-        # casting relevant values to int
-        for i in automl_params_dict:
-            try:
-                automl_params_dict[i] = int(automl_params_dict[i])
-            except ValueError:
-                pass
-        _assert(len(automl_params_dict) >= 1 and len(automl_params_dict) <= 3,
-                "{0}: Only num_configs, num_iterations, and algorithm may be specified".format(self.module_name))
-        for i in automl_params_dict:
-            if i == AutoMLConstants.NUM_CONFIGS:
-                self.num_configs = automl_params_dict[AutoMLConstants.NUM_CONFIGS]
-            elif i == AutoMLConstants.NUM_ITERS:
-                self.num_iters = automl_params_dict[AutoMLConstants.NUM_ITERS]
-            elif i == AutoMLConstants.ALGORITHM:
-                if automl_params_dict[AutoMLConstants.ALGORITHM].lower() == 'rand':
-                    self.algorithm = rand
-                elif automl_params_dict[AutoMLConstants.ALGORITHM].lower() == 'tpe':
-                    self.algorithm = tpe
-                # TODO: Add support for atpe uncomment the below lines after atpe works
-                # elif automl_params_dict[AutoMLSchema.ALGORITHM].lower() == 'atpe':
-                #     self.algorithm = atpe
-                else:
-                    plpy.error("{0}: valid algorithm 'automl_params' for hyperopt: 'rand', 'tpe'".format(self.module_name)) # , or 'atpe'
-            else:
-                plpy.error("{0}: {1} is an invalid automl param".format(self.module_name, i))
-        _assert(self.num_configs > 0 and self.num_iters > 0, "{0}: num_configs and num_iterations in 'automl_params' "
-                                                            "must be > 0".format(self.module_name))
-        _assert(self._is_valid_metrics_compute_frequency(self.num_iters), "{0}: 'metrics_compute_frequency' "
-                                                                          "out of iteration range".format(self.module_name))
-
-    def find_hyperopt_config(self):
-        """
-        Executes hyperopt on top of MOP.
-        """
-        make_mst_summary = True
-        trials = Trials()
-        domain = Domain(None, self.get_search_space())
-        rand_state = np.random.RandomState(self.random_state)
-        configs_lst = self.get_configs_list(self.num_configs, self.num_segments)
-
-        self.start_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
-        fit_multiple_runtime = 0
-        for low, high in configs_lst:
-            i, n = low, high - low + 1
-
-            # Using HyperOpt TPE/ATPE to generate parameters
-            hyperopt_params = []
-            sampled_params = []
-            for j in range(i, i + n):
-                new_param = self.algorithm.suggest([j], domain, trials, rand_state.randint(0, AutoMLConstants.INT_MAX))
-                new_param[0]['status'] = STATUS_RUNNING
-
-                trials.insert_trial_docs(new_param)
-                trials.refresh()
-                hyperopt_params.append(new_param[0])
-                sampled_params.append(new_param[0]['misc']['vals'])
-
-            model_id_list, compile_params, fit_params = self.extract_param_vals(sampled_params)
-            msts_list = self.generate_msts(model_id_list, compile_params, fit_params)
-            # cleanup_madlib_temp_tables(self.schema_madlib, AutoMLSchema.TARGET_SCHEMA)
-            try:
-                self.remove_temp_tables(model_training)
-            except:
-                pass
-            self.populate_temp_mst_tables(i, msts_list)
-
-            plpy.info("***Evaluating {n} newly suggested model configurations***".format(n=n))
-            fit_multiple_start_time = time.time()
-            model_training = FitMultipleModel(self.schema_madlib, self.source_table, AutoMLConstants.TEMP_OUTPUT_TABLE,
-                                              AutoMLConstants.TEMP_MST_TABLE, self.num_iters, self.use_gpus, self.validation_table,
-                                              self.metrics_compute_frequency, False, self.name, self.description, fit_multiple_runtime)
-            fit_multiple_runtime += time.time() - fit_multiple_start_time
-            if make_mst_summary:
-                self.generate_mst_summary_table(self.model_selection_summary_table)
-                make_mst_summary = False
-
-            # HyperOpt TPE update
-            for k, hyperopt_param in enumerate(hyperopt_params, i):
-                loss_val = plpy.execute("SELECT {AutoMLSchema.LOSS_METRIC} FROM {model_training.model_info_table} " \
-                             "WHERE {ModelSelectionSchema.MST_KEY}={k}".format(AutoMLSchema=AutoMLConstants,
-                                                                               ModelSelectionSchema=ModelSelectionSchema,
-                                                                               **locals()))[0][AutoMLConstants.LOSS_METRIC]
-
-                # avoid removing the two lines below (part of Hyperopt updates)
-                hyperopt_param['status'] = STATUS_OK
-                hyperopt_param['result'] = {'loss': loss_val, 'status': STATUS_OK}
-            trials.refresh()
-
-            # stacks info of all model configs together
-            self.update_model_output_and_info_tables(model_training)
-
-            self.print_best_mst_so_far()
-
-        self.end_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
-        self.update_model_selection_table()
-        self.generate_model_output_summary_table(model_training)
-        # cleanup_madlib_temp_tables(self.schema_madlib, AutoMLSchema.TARGET_SCHEMA)
-        self.remove_temp_tables(model_training)
-
-    def get_configs_list(self, num_configs, num_segments):
-        """
-        Gets schedule to evaluate model configs
-        :return: Model configs evaluation schedule
-        """
-        num_buckets = int(round(float(num_configs) / num_segments))
-        configs_list = []
-        start_idx = 1
-        models_populated = 0
-        for _ in range(num_buckets - 1):
-            end_idx = start_idx + num_segments
-            models_populated += num_segments
-            configs_list.append((start_idx, end_idx - 1))
-            start_idx = end_idx
-
-        remaining_models = num_configs - models_populated
-        configs_list.append((start_idx, start_idx + remaining_models-1))
-
-        return configs_list
-
-    def get_search_space(self):
-        """
-        Converts user inputs to hyperopt search space.
-        :return: Hyperopt search space
-        """
-
-        # initial params (outside 'optimizer_params_list')
-        hyperopt_search_dict = {}
-        hyperopt_search_dict['model_id'] = self.get_hyperopt_exps('model_id', self.model_id_list)
-
-
-        for j in self.fit_params_grid:
-            hyperopt_search_dict[j] = self.get_hyperopt_exps(j, self.fit_params_grid[j])
-
-        for i in self.compile_params_grid:
-            if i != ModelSelectionSchema.OPTIMIZER_PARAMS_LIST:
-                hyperopt_search_dict[i] = self.get_hyperopt_exps(i, self.compile_params_grid[i])
-
-        hyperopt_search_space_lst = []
-
-        counter = 1 # for unique names to allow multiple distribution options for optimizer params
-        for optimizer_dict in self.compile_params_grid[ModelSelectionSchema.OPTIMIZER_PARAMS_LIST]:
-            for o_param in optimizer_dict:
-                name = o_param + '_' + str(counter)
-                hyperopt_search_dict[name] = self.get_hyperopt_exps(name, optimizer_dict[o_param])
-            # appending deep copy
-            hyperopt_search_space_lst.append({k:v for k, v in hyperopt_search_dict.items()})
-            for o_param in optimizer_dict:
-                name = o_param + '_' + str(counter)
-                del hyperopt_search_dict[name]
-            counter += 1
-
-        return hp.choice('space', hyperopt_search_space_lst)
-
-    def get_hyperopt_exps(self, cp, param_value_list):
-        """
-        Samples a value from a given list of values, either randomly from a list of discrete elements,
-        or from a specified distribution.
-        :param cp: compile param
-        :param param_value_list: list of values (or specified distribution) for a param
-        :return: sampled value
-        """
-        # check if need to sample from a distribution
-        if type(param_value_list[-1]) == str and all([type(i) != str and not callable(i) for i in param_value_list[:-1]]) \
-                and len(param_value_list) > 1:
-            _assert_equal(len(param_value_list), 3,
-                          "{0}: '{1}' should have exactly 3 elements if picking from a distribution".format(self.module_name, cp))
-            _assert(param_value_list[1] > param_value_list[0],
-                    "{0}: '{1}' should be of the format [lower_bound, upper_bound, distribution_type]".format(self.module_name, cp))
-            if param_value_list[-1] == 'linear':
-                return hp.uniform(cp, param_value_list[0], param_value_list[1])
-            elif param_value_list[-1] == 'log':
-                return hp.loguniform(cp, np.log(param_value_list[0]), np.log(param_value_list[1]))
-            else:
-                plpy.error("{0}: Please choose a valid distribution type for '{1}': {2}".format(
-                    self.module_name,
-                    self.original_param_details(cp)[0],
-                    ['linear', 'log']))
-        else:
-            # random sampling
-            return hp.choice(cp, param_value_list)
-
-    def extract_param_vals(self, sampled_params):
-        """
-        Extract parameter values from hyperopt search space.
-        :param sampled_params: params suggested by hyperopt.
-        :return: lists of model ids, compile and fit params.
-        """
-        model_id_list, compile_params, fit_params = [], [], []
-        for params_dict in sampled_params:
-            compile_dict, fit_dict, optimizer_params_dict = {}, {}, {}
-            for p in params_dict:
-                if len(params_dict[p]) == 0 or p == 'space':
-                    continue
-                val = params_dict[p][0]
-                if p == 'model_id':
-                    model_id_list.append(self.model_id_list[val])
-                    continue
-                elif p in self.fit_params_grid:
-                    try:
-                        # check if params_dict[p] is an index
-                        fit_dict[p] = self.fit_params_grid[p][val]
-                    except TypeError:
-                        fit_dict[p] = params_dict[p]
-                elif p in self.compile_params_grid:
-                    try:
-                        # check if params_dict[p] is an index
-                        compile_dict[p] = self.compile_params_grid[p][val]
-                    except TypeError:
-                        compile_dict[p] = val
-                else:
-                    o_param, idx = self.original_param_details(p) # extracting unique attribute
-                    try:
-                        # check if params_dict[p] is an index (i.e. optimizer, for example)
-                        optimizer_params_dict[o_param] = self.compile_params_grid[
-                            ModelSelectionSchema.OPTIMIZER_PARAMS_LIST][idx][o_param][val]
-                    except TypeError:
-                        optimizer_params_dict[o_param] = val
-            compile_dict[ModelSelectionSchema.OPTIMIZER_PARAMS_LIST] = optimizer_params_dict
-
-            compile_params.append(compile_dict)
-            fit_params.append(fit_dict)
-
-        return model_id_list, compile_params, fit_params
-
-    def original_param_details(self, name):
-        """
-        Returns the original param name and book-keeping detail.
-        :param name: name of the param (example - lr_1, epsilon_12)
-        :return: original param name and book-keeping position.
-        """
-        parts = name.split('_')
-        return '_'.join(parts[:-1]), int(parts[-1]) - 1
-
-
-    def generate_msts(self, model_id_list, compile_params, fit_params):
-        """
-        Generates msts to insert in the mst table.
-        :param model_id_list: list of model ids
-        :param compile_params: list compile params
-        :param fit_params:list of fit params
-        :return: List of msts to insert in the mst table.
-        """
-        assert len(model_id_list) == len(compile_params) == len(fit_params)
-        msts = []
-
-        for i in range(len(compile_params)):
-            combination = {}
-            combination[ModelSelectionSchema.MODEL_ID] = model_id_list[i]
-            combination[ModelSelectionSchema.COMPILE_PARAMS] = generate_row_string(compile_params[i])
-            combination[ModelSelectionSchema.FIT_PARAMS] = generate_row_string(fit_params[i])
-            msts.append(combination)
-
-        return msts
-
-    def populate_temp_mst_tables(self, i, msts_list):
-        """
-        Creates and populates temp mst and summary tables with newly suggested model configs for evaluation.
-        :param i: mst key number
-        :param msts_list: list of generated msts.
-        """
-        # extra sanity check
-        if table_exists(AutoMLConstants.TEMP_MST_TABLE):
-            drop_tables([AutoMLConstants.TEMP_MST_TABLE])
-
-        create_query = """
-                        CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} (
-                            {mst_key} INTEGER,
-                            {model_id} INTEGER,
-                            {compile_params} VARCHAR,
-                            {fit_params} VARCHAR,
-                            unique ({model_id}, {compile_params}, {fit_params})
-                        );
-                       """.format(AutoMLSchema=AutoMLConstants,
-                                  mst_key=ModelSelectionSchema.MST_KEY,
-                                  model_id=ModelSelectionSchema.MODEL_ID,
-                                  compile_params=ModelSelectionSchema.COMPILE_PARAMS,
-                                  fit_params=ModelSelectionSchema.FIT_PARAMS)
-        plpy.execute(create_query)
-        mst_key_val = i
-        for mst in msts_list:
-            model_id = mst[ModelSelectionSchema.MODEL_ID]
-            compile_params = mst[ModelSelectionSchema.COMPILE_PARAMS]
-            fit_params = mst[ModelSelectionSchema.FIT_PARAMS]
-            insert_query = """
-                            INSERT INTO
-                                {AutoMLSchema.TEMP_MST_TABLE}(
-                                    {mst_key_col},
-                                    {model_id_col},
-                                    {compile_params_col},
-                                    {fit_params_col}
-                                )
-                            VALUES (
-                                {mst_key_val},
-                                {model_id},
-                                $${compile_params}$$,
-                                $${fit_params}$$
-                            )
-                           """.format(mst_key_col=ModelSelectionSchema.MST_KEY,
-                                      model_id_col=ModelSelectionSchema.MODEL_ID,
-                                      compile_params_col=ModelSelectionSchema.COMPILE_PARAMS,
-                                      fit_params_col=ModelSelectionSchema.FIT_PARAMS,
-                                      AutoMLSchema=AutoMLConstants,
-                                      **locals())
-            mst_key_val += 1
-            plpy.execute(insert_query)
-
-        self.generate_mst_summary_table(AutoMLConstants.TEMP_MST_SUMMARY_TABLE)
-
-    def generate_mst_summary_table(self, tbl_name):
-        """
-        generates mst summary table with the given name
-        :param tbl_name: name of summary table
-        """
-        _assert(tbl_name.endswith('_summary'), 'invalid summary table name')
-
-        # extra sanity check
-        if table_exists(tbl_name):
-            drop_tables([tbl_name])
-
-        create_query = """
-                        CREATE TABLE {tbl_name} (
-                            {model_arch_table} VARCHAR,
-                            {object_table} VARCHAR
-                        );
-                       """.format(tbl_name=tbl_name,
-                                  model_arch_table=ModelSelectionSchema.MODEL_ARCH_TABLE,
-                                  object_table=ModelSelectionSchema.OBJECT_TABLE)
-        plpy.execute(create_query)
-
-        if self.object_table is None:
-            object_table = 'NULL::VARCHAR'
-        else:
-            object_table = '$${0}$$'.format(self.object_table)
-        insert_summary_query = """
-                        INSERT INTO
-                            {tbl_name}(
-                                {model_arch_table_name},
-                                {object_table_name}
-                        )
-                        VALUES (
-                            $${self.model_arch_table}$$,
-                            {object_table}
-                        )
-                       """.format(model_arch_table_name=ModelSelectionSchema.MODEL_ARCH_TABLE,
-                                  object_table_name=ModelSelectionSchema.OBJECT_TABLE,
-                                  **locals())
-        plpy.execute(insert_summary_query)
-
-    def update_model_output_and_info_tables(self, model_training):
-        """
-        Updates model output and info tables by stacking rows after each evaluation round.
-        :param model_training: Fit Multiple class object
-        """
-        metrics_iters = plpy.execute("SELECT {AutoMLSchema.METRICS_ITERS} " \
-                                     "FROM {model_training.original_model_output_table}_summary".format(self=self,
-                                                                                                        model_training=model_training,
-                                                                                                        AutoMLSchema=AutoMLConstants))[0][AutoMLConstants.METRICS_ITERS]
-        if metrics_iters:
-            metrics_iters = "ARRAY{0}".format(metrics_iters)
-        # stacking new rows from training
-        plpy.execute("INSERT INTO {self.model_output_table} SELECT * FROM " \
-                     "{model_training.original_model_output_table}".format(self=self, model_training=model_training))
-        plpy.execute("INSERT INTO {self.model_info_table} SELECT *, {metrics_iters} FROM " \
-                     "{model_training.model_info_table}".format(self=self,
-                                                                     model_training=model_training,
-                                                                     metrics_iters=metrics_iters))
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
index a5c7507..113ec16 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
@@ -625,9 +625,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.hyperband_schedule(
       eta                   INTEGER DEFAULT 3,
       skip_last             INTEGER DEFAULT 0
 ) RETURNS VOID AS $$
-    PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl')
+    PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl_hyperband')
     with AOControl(False) and MinWarning('warning'):
-        schedule_loader = madlib_keras_automl.HyperbandSchedule(schedule_table, r, eta, skip_last)
+        schedule_loader = madlib_keras_automl_hyperband.HyperbandSchedule(schedule_table, r, eta, skip_last)
         schedule_loader.load()
 $$ LANGUAGE plpythonu VOLATILE
               m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
@@ -650,13 +650,15 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.madlib_keras_automl(
     name                           VARCHAR DEFAULT NULL,
     description                    VARCHAR DEFAULT NULL
 ) RETURNS VOID AS $$
-    PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl')
+if automl_method is None or automl_method.lower() == 'hyperband':
+    PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl_hyperband')
     with AOControl(False) and MinWarning('warning'):
-        if automl_method is None or automl_method.lower() == 'hyperband':
-            schedule_loader = madlib_keras_automl.AutoMLHyperband(**globals())
-        elif automl_method.lower() == 'hyperopt':
-            schedule_loader = madlib_keras_automl.AutoMLHyperopt(**globals())
-        else:
-            plpy.error("madlib_keras_automl: The chosen automl method must be 'hyperband' or 'hyperopt'")
+        schedule_loader = madlib_keras_automl_hyperband.AutoMLHyperband(**globals())
+elif automl_method.lower() == 'hyperopt':
+    PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl_hyperopt')
+    with AOControl(False) and MinWarning('warning'):
+        schedule_loader = madlib_keras_automl_hyperopt.AutoMLHyperopt(**globals())
+else:
+    plpy.error("madlib_keras_automl: The chosen automl method must be 'hyperband' or 'hyperopt'")
 $$ LANGUAGE plpythonu VOLATILE
     m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl_hyperband.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl_hyperband.py_in
new file mode 100644
index 0000000..2d10f8c
--- /dev/null
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl_hyperband.py_in
@@ -0,0 +1,419 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import math
+import plpy
+
+from madlib_keras_automl import KerasAutoML, AutoMLConstants
+from utilities.utilities import get_current_timestamp, get_seg_number, get_segments_per_host, \
+    unique_string, add_postfix, extract_keyvalue_params, _assert, _assert_equal, rename_table, \
+    is_platform_pg
+from utilities.control import SetGUC
+from madlib_keras_fit_multiple_model import FitMultipleModel
+from madlib_keras_model_selection import MstSearch, ModelSelectionSchema
+from utilities.validate_args import table_exists, drop_tables, input_tbl_valid
+
+class HyperbandSchedule():
+    """The utility class for loading a hyperband schedule table with algorithm inputs.
+
+    Attributes:
+        schedule_table (string): Name of output table containing hyperband schedule.
+        R (int): Maximum number of resources (iterations) that can be allocated
+  to a single configuration.
+        eta (int): Controls the proportion of configurations discarded in
+  each round of successive halving.
+        skip_last (int): The number of last rounds to skip.
+    """
+    def __init__(self, schedule_table, R, eta=3, skip_last=0):
+        if is_platform_pg():
+            plpy.error(
+                "DL: Hyperband schedule is not supported on PostgreSQL.")
+        self.schedule_table = schedule_table # table name to store hyperband schedule
+        self.R = R # maximum iterations/epochs allocated to a configuration
+        self.eta = eta # defines downsampling rate
+        self.skip_last = skip_last
+        self.module_name = 'hyperband_schedule'
+        self.validate_inputs()
+
+        # number of unique executions of Successive Halving (minus one)
+        self.s_max = int(math.floor(math.log(self.R, self.eta)))
+        self.validate_s_max()
+
+        self.schedule_vals = []
+
+        self.calculate_schedule()
+
+    def load(self):
+        """
+        The entry point for loading the hyperband schedule table.
+        """
+        self.create_schedule_table()
+        self.insert_into_schedule_table()
+
+    def validate_inputs(self):
+        """
+        Validates user input values
+        """
+        _assert(self.eta > 1, "{0}: eta must be greater than 1".format(self.module_name))
+        _assert(self.R >= self.eta, "{0}: R should not be less than eta".format(self.module_name))
+
+    def validate_s_max(self):
+        _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "{0}: skip_last must be " +
+                "non-negative and less than {1}".format(self.module_name,self.s_max))
+
+    def calculate_schedule(self):
+        """
+        Calculates the hyperband schedule (number of configs and allocated resources)
+        in each round of each bracket and skips the number of last rounds specified in 'skip_last'
+        """
+        for s in reversed(range(self.s_max+1)):
+            n = int(math.ceil(int((self.s_max+1)/(s+1))*math.pow(self.eta, s))) # initial number of configurations
+            r = self.R * math.pow(self.eta, -s)
+
+            for i in range((s+1) - int(self.skip_last)):
+                # Computing each of the
+                n_i = n*math.pow(self.eta, -i)
+                r_i = r*math.pow(self.eta, i)
+
+                self.schedule_vals.append({AutoMLConstants.BRACKET: s,
+                                           AutoMLConstants.ROUND: i,
+                                           AutoMLConstants.CONFIGURATIONS: int(n_i),
+                                           AutoMLConstants.RESOURCES: int(round(r_i))})
+
+    def create_schedule_table(self):
+        """Initializes the output schedule table"""
+        create_query = """
+                        CREATE TABLE {self.schedule_table} (
+                            {s} INTEGER,
+                            {i} INTEGER,
+                            {n_i} INTEGER,
+                            {r_i} INTEGER,
+                            unique ({s}, {i})
+                        );
+                       """.format(self=self,
+                                  s=AutoMLConstants.BRACKET,
+                                  i=AutoMLConstants.ROUND,
+                                  n_i=AutoMLConstants.CONFIGURATIONS,
+                                  r_i=AutoMLConstants.RESOURCES)
+        plpy.execute(create_query)
+
+    def insert_into_schedule_table(self):
+        """Insert everything in self.schedule_vals into the output schedule table."""
+        for sd in self.schedule_vals:
+            sd_s = sd[AutoMLConstants.BRACKET]
+            sd_i = sd[AutoMLConstants.ROUND]
+            sd_n_i = sd[AutoMLConstants.CONFIGURATIONS]
+            sd_r_i = sd[AutoMLConstants.RESOURCES]
+            insert_query = """
+                            INSERT INTO
+                                {self.schedule_table}(
+                                    {s_col},
+                                    {i_col},
+                                    {n_i_col},
+                                    {r_i_col}
+                                )
+                            VALUES (
+                                {sd_s},
+                                {sd_i},
+                                {sd_n_i},
+                                {sd_r_i}
+                            )
+                           """.format(s_col=AutoMLConstants.BRACKET,
+                                      i_col=AutoMLConstants.ROUND,
+                                      n_i_col=AutoMLConstants.CONFIGURATIONS,
+                                      r_i_col=AutoMLConstants.RESOURCES,
+                                      **locals())
+            plpy.execute(insert_query)
+
+class AutoMLHyperband(KerasAutoML):
+    """
+    This class implements Hyperband, an infinite-arm bandit based algorithm that speeds up random search
+    through adaptive resource allocation, successive halving (SHA), and early stopping.
+
+    This class showcases a novel hyperband implementation by executing the hyperband rounds 'diagonally'
+    to evaluate multiple configurations together and leverage the compute power of MPP databases such as Greenplum.
+
+    This automl method inherits qualities from the automl class.
+    """
+    def __init__(self, schema_madlib, source_table, model_output_table, model_arch_table, model_selection_table,
+                 model_id_list, compile_params_grid, fit_params_grid, automl_method,
+                 automl_params, random_state=None, object_table=None,
+                 use_gpus=False, validation_table=None, metrics_compute_frequency=None,
+                 name=None, description=None, **kwargs):
+        automl_method = automl_method if automl_method else AutoMLConstants.HYPERBAND
+        automl_params = automl_params if automl_params else 'R=6, eta=3, skip_last=0'
+        KerasAutoML.__init__(self, schema_madlib, source_table, model_output_table, model_arch_table,
+                             model_selection_table, model_id_list, compile_params_grid, fit_params_grid,
+                             automl_method, automl_params, random_state, object_table, use_gpus,
+                             validation_table, metrics_compute_frequency, name, description, **kwargs)
+        self.validate_and_define_inputs()
+        self.create_model_output_table()
+        self.create_model_output_info_table()
+        self.find_hyperband_config()
+
+    def validate_and_define_inputs(self):
+        automl_params_dict = extract_keyvalue_params(self.automl_params,
+                                                     lower_case_names=False)
+        # casting dict values to int
+        for i in automl_params_dict:
+            _assert(i in AutoMLConstants.HYPERBAND_PARAMS,
+                    "{0}: Invalid param(s) passed in for hyperband. "\
+                    "Only R, eta, and skip_last may be specified".format(self.module_name))
+            automl_params_dict[i] = int(automl_params_dict[i])
+        _assert(len(automl_params_dict) >= 1 and len(automl_params_dict) <= 3,
+                "{0}: Only R, eta, and skip_last may be specified".format(self.module_name))
+        for i in automl_params_dict:
+            if i == AutoMLConstants.R:
+                self.R = automl_params_dict[AutoMLConstants.R]
+            elif i == AutoMLConstants.ETA:
+                self.eta = automl_params_dict[AutoMLConstants.ETA]
+            elif i == AutoMLConstants.SKIP_LAST:
+                self.skip_last = automl_params_dict[AutoMLConstants.SKIP_LAST]
+            else:
+                plpy.error("{0}: {1} is an invalid automl param".format(self.module_name, i))
+        _assert(self.eta > 1, "{0}: eta must be greater than 1".format(self.module_name))
+        _assert(self.R >= self.eta, "{0}: R should not be less than eta".format(self.module_name))
+        self.s_max = int(math.floor(math.log(self.R, self.eta)))
+        _assert(self.skip_last >= 0 and self.skip_last < self.s_max+1, "{0}: skip_last must be " \
+                "non-negative and less than {1}".format(self.module_name, self.s_max))
+
+    def find_hyperband_config(self):
+        """
+        Executes the diagonal hyperband algorithm.
+        """
+        initial_vals = {}
+
+        # get hyper parameter configs for each s
+        for s in reversed(range(self.s_max+1)):
+            n = int(math.ceil(int((self.s_max+1)/(s+1))*math.pow(self.eta, s))) # initial number of configurations
+            r = self.R * math.pow(self.eta, -s) # initial number of iterations to run configurations for
+            initial_vals[s] = (n, int(round(r)))
+        self.start_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
+        random_search = MstSearch(self.schema_madlib,
+                                  self.model_arch_table,
+                                  self.model_selection_table,
+                                  self.model_id_list,
+                                  self.compile_params_grid,
+                                  self.fit_params_grid,
+                                  'random',
+                                  sum([initial_vals[k][0] for k in initial_vals][self.skip_last:]),
+                                  self.random_state,
+                                  self.object_table)
+        random_search.load() # for populating mst tables
+
+        # for creating the summary table for usage in fit multiple
+        plpy.execute("CREATE TABLE {AutoMLSchema.TEMP_MST_SUMMARY_TABLE} AS " \
+                     "SELECT * FROM {random_search.model_selection_summary_table}".format(AutoMLSchema=AutoMLConstants,
+                                                                                          random_search=random_search))
+        ranges_dict = self.mst_key_ranges_dict(initial_vals)
+        # to store the bracket and round numbers
+        s_dict, i_dict = {}, {}
+        for key, val in ranges_dict.items():
+            for mst_key in range(val[0], val[1]+1):
+                s_dict[mst_key] = key
+                i_dict[mst_key] = -1
+
+        # outer loop on diagonal
+        for i in range((self.s_max+1) - int(self.skip_last)):
+            # inner loop on s desc
+            temp_lst = []
+            configs_prune_lookup = {}
+            for s in range(self.s_max, self.s_max-i-1, -1):
+                n = initial_vals[s][0]
+                n_i = n * math.pow(self.eta, -i+self.s_max-s)
+                configs_prune_lookup[s] = int(round(n_i))
+                temp_lst.append("{0} configs under bracket={1} & round={2}".format(int(n_i), s, s-self.s_max+i))
+            num_iterations = int(initial_vals[self.s_max-i][1])
+            plpy.info('*** Diagonally evaluating ' + ', '.join(temp_lst) + ' with {0} iterations ***'.format(
+                num_iterations))
+
+            self.reconstruct_temp_mst_table(i, ranges_dict, configs_prune_lookup) # has keys to evaluate
+            active_keys = plpy.execute("SELECT {ModelSelectionSchema.MST_KEY} " \
+                                       "FROM {AutoMLSchema.TEMP_MST_TABLE}".format(AutoMLSchema=AutoMLConstants,
+                                                                                   ModelSelectionSchema=ModelSelectionSchema))
+            for k in active_keys:
+                i_dict[k[ModelSelectionSchema.MST_KEY]] += 1
+            self.warm_start = int(i != 0)
+            mcf = self.metrics_compute_frequency if self._is_valid_metrics_compute_frequency(num_iterations) else None
+            with SetGUC("plan_cache_mode", "force_generic_plan"):
+                model_training = FitMultipleModel(self.schema_madlib, self.source_table, AutoMLConstants.TEMP_OUTPUT_TABLE,
+                                                AutoMLConstants.TEMP_MST_TABLE, num_iterations, self.use_gpus,
+                                                self.validation_table, mcf, self.warm_start, self.name, self.description)
+            self.update_model_output_table(model_training)
+            self.update_model_output_info_table(i, model_training, initial_vals)
+
+            self.print_best_mst_so_far()
+
+        self.end_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
+        self.add_additional_info_cols(s_dict, i_dict)
+        self.update_model_selection_table()
+        self.generate_model_output_summary_table(model_training)
+        self.remove_temp_tables(model_training)
+
+    def mst_key_ranges_dict(self, initial_vals):
+        """
+        Extracts the ranges of model configs (using mst_keys) belonging to / sampled as part of
+        executing a particular SHA bracket.
+        """
+        d = {}
+        for s_val in sorted(initial_vals.keys(), reverse=True): # going from s_max to 0
+            if s_val == self.s_max:
+                d[s_val] = (1, initial_vals[s_val][0])
+            else:
+                d[s_val] = (d[s_val+1][1]+1, d[s_val+1][1]+initial_vals[s_val][0])
+        return d
+
+    def reconstruct_temp_mst_table(self, i, ranges_dict, configs_prune_lookup):
+        """
+        Drops and Reconstructs a temp mst table for evaluation along particular diagonals of hyperband.
+        :param i: outer diagonal loop iteration.
+        :param ranges_dict: model config ranges to group by bracket number.
+        :param configs_prune_lookup: Lookup dictionary for configs to evaluate for a diagonal.
+        :return:
+        """
+        if i == 0:
+            _assert_equal(len(configs_prune_lookup), 1, "invalid args")
+            lower_bound, upper_bound = ranges_dict[self.s_max]
+            plpy.execute("CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} AS SELECT * FROM {self.model_selection_table} "
+                         "WHERE {ModelSelectionSchema.MST_KEY} >= {lower_bound} " \
+                         "AND {ModelSelectionSchema.MST_KEY} <= {upper_bound}".format(self=self,
+                                                                                      AutoMLSchema=AutoMLConstants,
+                                                                                      lower_bound=lower_bound,
+                                                                                      upper_bound=upper_bound,
+                                                                                      ModelSelectionSchema=ModelSelectionSchema))
+            return
+        # dropping and repopulating temp_mst_table
+        drop_tables([AutoMLConstants.TEMP_MST_TABLE])
+
+        # {mst_key} changed from SERIAL to INTEGER for safe insertions and preservation of mst_key values
+        create_query = """
+                        CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} (
+                            {mst_key} INTEGER,
+                            {model_id} INTEGER,
+                            {compile_params} VARCHAR,
+                            {fit_params} VARCHAR,
+                            unique ({model_id}, {compile_params}, {fit_params})
+                        );
+                       """.format(AutoMLSchema=AutoMLConstants,
+                                  mst_key=ModelSelectionSchema.MST_KEY,
+                                  model_id=ModelSelectionSchema.MODEL_ID,
+                                  compile_params=ModelSelectionSchema.COMPILE_PARAMS,
+                                  fit_params=ModelSelectionSchema.FIT_PARAMS)
+        plpy.execute(create_query)
+
+        query = ""
+        new_configs = True
+        for s_val in configs_prune_lookup:
+            lower_bound, upper_bound = ranges_dict[s_val]
+            if new_configs:
+                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT {ModelSelectionSchema.MST_KEY}, " \
+                         "{ModelSelectionSchema.MODEL_ID}, {ModelSelectionSchema.COMPILE_PARAMS}, " \
+                         "{ModelSelectionSchema.FIT_PARAMS} FROM {self.model_selection_table} WHERE " \
+                         "{ModelSelectionSchema.MST_KEY} >= {lower_bound} AND {ModelSelectionSchema.MST_KEY} <= " \
+                         "{upper_bound};".format(self=self, AutoMLSchema=AutoMLConstants,
+                                                 ModelSelectionSchema=ModelSelectionSchema,
+                                                 lower_bound=lower_bound, upper_bound=upper_bound)
+                new_configs = False
+            else:
+                query += "INSERT INTO {AutoMLSchema.TEMP_MST_TABLE} SELECT {ModelSelectionSchema.MST_KEY}, " \
+                         "{ModelSelectionSchema.MODEL_ID}, {ModelSelectionSchema.COMPILE_PARAMS}, " \
+                         "{ModelSelectionSchema.FIT_PARAMS} " \
+                         "FROM {self.model_info_table} WHERE {ModelSelectionSchema.MST_KEY} >= {lower_bound} " \
+                         "AND {ModelSelectionSchema.MST_KEY} <= {upper_bound} ORDER BY {AutoMLSchema.LOSS_METRIC} " \
+                         "LIMIT {configs_prune_lookup_val};".format(self=self, AutoMLSchema=AutoMLConstants,
+                                                                    ModelSelectionSchema=ModelSelectionSchema,
+                                                                    lower_bound=lower_bound, upper_bound=upper_bound,
+                                                                    configs_prune_lookup_val=configs_prune_lookup[s_val])
+        plpy.execute(query)
+
+    def update_model_output_table(self, model_training):
+        """
+        Updates gathered information of a hyperband diagonal run to the overall model output table.
+        :param model_training: Fit Multiple function call object.
+        """
+        # updates model weights for any previously trained configs
+        plpy.execute("UPDATE {self.model_output_table} a SET model_weights=" \
+                     "t.model_weights FROM {model_training.original_model_output_table} t " \
+                     "WHERE a.mst_key=t.mst_key".format(self=self, model_training=model_training))
+
+        # truncate and re-creates table to avoid memory blow-ups
+        with SetGUC("dev_opt_unsafe_truncate_in_subtransaction", "on"):
+            temp_model_table = unique_string('updated_model')
+            plpy.execute("CREATE TABLE {temp_model_table} AS SELECT * FROM {self.model_output_table};" \
+                         "TRUNCATE {self.model_output_table}; " \
+                         "DROP TABLE {self.model_output_table};".format(temp_model_table=temp_model_table, self=self))
+            rename_table(self.schema_madlib, temp_model_table, self.model_output_table)
+
+        # inserts any newly trained configs
+        plpy.execute("INSERT INTO {self.model_output_table} SELECT * FROM {model_training.original_model_output_table} " \
+                     "WHERE {model_training.original_model_output_table}.mst_key NOT IN " \
+                     "(SELECT {ModelSelectionSchema.MST_KEY} FROM {self.model_output_table})".format(self=self,
+                                                                              model_training=model_training,
+                                                                              ModelSelectionSchema=ModelSelectionSchema))
+
+    def update_model_output_info_table(self, i, model_training, initial_vals):
+        """
+        Updates gathered information of a hyperband diagonal run to the overall model output info table.
+        :param i: outer diagonal loop iteration.
+        :param model_training: Fit Multiple function call object.
+        :param initial_vals: Dictionary of initial configurations and resources as part of the initial hyperband
+        schedule.
+        """
+        # normalizing factor for metrics_iters due to warm start
+        epochs_factor = sum([n[1] for n in initial_vals.values()][::-1][:i]) # i & initial_vals args needed
+        iters = plpy.execute("SELECT {AutoMLSchema.METRICS_ITERS} " \
+                             "FROM {model_training.model_summary_table}".format(AutoMLSchema=AutoMLConstants,
+                                                                                model_training=model_training))
+        metrics_iters_val = [epochs_factor+mi for mi in iters[0]['metrics_iters']] # global iteration counter
+
+        validation_update_q = "validation_metrics_final=t.validation_metrics_final, " \
+                                     "validation_loss_final=t.validation_loss_final, " \
+                                     "validation_metrics=a.validation_metrics || t.validation_metrics, " \
+                                     "validation_loss=a.validation_loss || t.validation_loss, " \
+            if self.validation_table else ""
+
+        # updates train/val info for any previously trained configs
+        plpy.execute("UPDATE {self.model_info_table} a SET " \
+                     "metrics_elapsed_time=a.metrics_elapsed_time || t.metrics_elapsed_time, " \
+                     "training_metrics_final=t.training_metrics_final, " \
+                     "training_loss_final=t.training_loss_final, " \
+                     "training_metrics=a.training_metrics || t.training_metrics, " \
+                     "training_loss=a.training_loss || t.training_loss, ".format(self=self) + validation_update_q +
+                     "{AutoMLSchema.METRICS_ITERS}=a.metrics_iters || ARRAY{metrics_iters_val}::INTEGER[] " \
+                     "FROM {model_training.model_info_table} t " \
+                     "WHERE a.mst_key=t.mst_key".format(model_training=model_training, AutoMLSchema=AutoMLConstants,
+                                                        metrics_iters_val=metrics_iters_val))
+
+        # inserts info about metrics and validation for newly trained model configs
+        plpy.execute("INSERT INTO {self.model_info_table} SELECT t.*, ARRAY{metrics_iters_val}::INTEGER[] AS metrics_iters " \
+                     "FROM {model_training.model_info_table} t WHERE t.mst_key NOT IN " \
+                     "(SELECT {ModelSelectionSchema.MST_KEY} FROM {self.model_info_table})".format(self=self,
+                                                                            model_training=model_training,
+                                                                            metrics_iters_val=metrics_iters_val,
+                                                                            ModelSelectionSchema=ModelSelectionSchema))
+
+    def add_additional_info_cols(self, s_dict, i_dict):
+        """Adds s and i columns to the info table"""
+
+        plpy.execute("ALTER TABLE {self.model_info_table} ADD COLUMN s int, ADD COLUMN i int;".format(self=self))
+
+        l = [(k, s_dict[k], i_dict[k]) for k in s_dict]
+        query = "UPDATE {self.model_info_table} t SET s=b.s_val, i=b.i_val FROM unnest(ARRAY{l}) " \
+                "b (key integer, s_val integer, i_val integer) WHERE t.mst_key=b.key".format(self=self, l=l)
+        plpy.execute(query)
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl_hyperopt.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl_hyperopt.py_in
new file mode 100644
index 0000000..34d2e97
--- /dev/null
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl_hyperopt.py_in
@@ -0,0 +1,458 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from ast import literal_eval
+from hyperopt import hp, rand, tpe, atpe, Trials, STATUS_OK, STATUS_RUNNING
+from hyperopt.base import Domain
+import numpy as np
+import plpy
+import time
+
+from madlib_keras_automl import KerasAutoML, AutoMLConstants
+from input_data_preprocessor import DistributionRulesOptions
+from madlib_keras_fit_multiple_model import FitMultipleModel
+from madlib_keras_helper import generate_row_string
+from madlib_keras_helper import DISTRIBUTION_RULES_COLNAME
+from madlib_keras_model_selection import ModelSelectionSchema
+from utilities.control import SetGUC
+from utilities.utilities import get_current_timestamp, get_seg_number, get_segments_per_host, \
+    unique_string, add_postfix, extract_keyvalue_params, _assert, _assert_equal, rename_table
+from utilities.validate_args import table_exists, drop_tables, input_tbl_valid
+
+class AutoMLHyperopt(KerasAutoML):
+    """
+    This class implements Hyperopt, another automl method that explores awkward search spaces using
+    Random Search, Tree-structured Parzen Estimator (TPE), or Adaptive TPE.
+
+    This function executes hyperopt on top of our multiple model training infrastructure powered with
+    Model hOpper Parallelism (MOP), a hybrid of data and task parallelism.
+
+    This automl method inherits qualities from the automl class.
+    """
+    def __init__(self, schema_madlib, source_table, model_output_table, model_arch_table, model_selection_table,
+                 model_id_list, compile_params_grid, fit_params_grid, automl_method,
+                 automl_params, random_state=None, object_table=None,
+                 use_gpus=False, validation_table=None, metrics_compute_frequency=None,
+                 name=None, description=None, **kwargs):
+        automl_method = automl_method if automl_method else AutoMLConstants.HYPEROPT
+        automl_params = automl_params if automl_params else 'num_configs=20, num_iterations=5, algorithm=tpe'
+        KerasAutoML.__init__(self, schema_madlib, source_table, model_output_table, model_arch_table,
+                             model_selection_table, model_id_list, compile_params_grid, fit_params_grid,
+                             automl_method, automl_params, random_state, object_table, use_gpus,
+                             validation_table, metrics_compute_frequency, name, description, **kwargs)
+        self.compile_params_grid = self.compile_params_grid.replace('\n', '').replace(' ', '')
+        self.fit_params_grid = self.fit_params_grid.replace('\n', '').replace(' ', '')
+        try:
+            self.compile_params_grid = literal_eval(self.compile_params_grid)
+
+        except:
+            plpy.error("Invalid syntax in 'compile_params_dict'")
+        try:
+            self.fit_params_grid = literal_eval(self.fit_params_grid)
+        except:
+            plpy.error("Invalid syntax in 'fit_params_dict'")
+        self.validate_and_define_inputs()
+        self.num_segments = self.get_num_segments()
+
+        self.create_model_output_table()
+        self.create_model_output_info_table()
+        self.find_hyperopt_config()
+
+    def get_num_segments(self):
+        """
+        # query dist rules from summary table to get the total no of segments
+        :return:
+        """
+        source_summary_table = add_postfix(self.source_table, '_summary')
+        dist_rules = plpy.execute("SELECT {0} from {1}".format(DISTRIBUTION_RULES_COLNAME, source_summary_table))[0][DISTRIBUTION_RULES_COLNAME]
+        if dist_rules == DistributionRulesOptions.ALL_SEGMENTS:
+            return get_seg_number()
+
+        return len(dist_rules)
+
+    def validate_and_define_inputs(self):
+        automl_params_dict = extract_keyvalue_params(self.automl_params,
+                                                     lower_case_names=True)
+        # casting relevant values to int
+        for i in automl_params_dict:
+            _assert(i in AutoMLConstants.HYPEROPT_PARAMS,
+                    "{0}: Invalid param(s) passed in for hyperopt. "\
+                    "Only num_configs, num_iterations, and algorithm may be specified".format(self.module_name))
+            try:
+                automl_params_dict[i] = int(automl_params_dict[i])
+            except ValueError:
+                pass
+        _assert(len(automl_params_dict) >= 1 and len(automl_params_dict) <= 3,
+                "{0}: Only num_configs, num_iterations, and algorithm may be specified".format(self.module_name))
+        for i in automl_params_dict:
+            if i == AutoMLConstants.NUM_CONFIGS:
+                self.num_configs = automl_params_dict[AutoMLConstants.NUM_CONFIGS]
+            elif i == AutoMLConstants.NUM_ITERS:
+                self.num_iters = automl_params_dict[AutoMLConstants.NUM_ITERS]
+            elif i == AutoMLConstants.ALGORITHM:
+                if automl_params_dict[AutoMLConstants.ALGORITHM].lower() == 'rand':
+                    self.algorithm = rand
+                elif automl_params_dict[AutoMLConstants.ALGORITHM].lower() == 'tpe':
+                    self.algorithm = tpe
+                # TODO: Add support for atpe uncomment the below lines after atpe works
+                # elif automl_params_dict[AutoMLSchema.ALGORITHM].lower() == 'atpe':
+                #     self.algorithm = atpe
+                else:
+                    plpy.error("{0}: valid algorithm 'automl_params' for hyperopt: 'rand', 'tpe'".format(self.module_name)) # , or 'atpe'
+            else:
+                plpy.error("{0}: {1} is an invalid automl param".format(self.module_name, i))
+        _assert(self.num_configs > 0 and self.num_iters > 0, "{0}: num_configs and num_iterations in 'automl_params' "
+                                                            "must be > 0".format(self.module_name))
+        _assert(self._is_valid_metrics_compute_frequency(self.num_iters), "{0}: 'metrics_compute_frequency' "
+                                                                          "out of iteration range".format(self.module_name))
+
+    def find_hyperopt_config(self):
+        """
+        Executes hyperopt on top of MOP.
+        """
+        make_mst_summary = True
+        trials = Trials()
+        domain = Domain(None, self.get_search_space())
+        rand_state = np.random.RandomState(self.random_state)
+        configs_lst = self.get_configs_list(self.num_configs, self.num_segments)
+
+        self.start_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
+        metrics_elapsed_time_offset = 0
+        model_training = None
+        for low, high in configs_lst:
+            i, n = low, high - low + 1
+
+            # Using HyperOpt TPE/ATPE to generate parameters
+            hyperopt_params = []
+            sampled_params = []
+            for j in range(i, i + n):
+                new_param = self.algorithm.suggest([j], domain, trials, rand_state.randint(0, AutoMLConstants.INT_MAX))
+                new_param[0]['status'] = STATUS_RUNNING
+
+                trials.insert_trial_docs(new_param)
+                trials.refresh()
+                hyperopt_params.append(new_param[0])
+                sampled_params.append(new_param[0]['misc']['vals'])
+
+            model_id_list, compile_params, fit_params = self.extract_param_vals(sampled_params)
+            msts_list = self.generate_msts(model_id_list, compile_params, fit_params)
+            self.remove_temp_tables(model_training)
+            self.populate_temp_mst_tables(i, msts_list)
+
+            plpy.info("***Evaluating {n} newly suggested model configurations***".format(n=n))
+            start_time = time.time()
+            with SetGUC("plan_cache_mode", "force_generic_plan"):
+                model_training = FitMultipleModel(self.schema_madlib, self.source_table, AutoMLConstants.TEMP_OUTPUT_TABLE,
+                                                  AutoMLConstants.TEMP_MST_TABLE, self.num_iters, self.use_gpus, self.validation_table,
+                                                  self.metrics_compute_frequency, False, self.name, self.description,
+                                                  metrics_elapsed_time_offset=metrics_elapsed_time_offset)
+            metrics_elapsed_time_offset += time.time() - start_time
+            if make_mst_summary:
+                self.generate_mst_summary_table(self.model_selection_summary_table)
+                make_mst_summary = False
+
+            # HyperOpt TPE update
+            for k, hyperopt_param in enumerate(hyperopt_params, i):
+                loss_val = plpy.execute("SELECT {AutoMLSchema.LOSS_METRIC} FROM {model_training.model_info_table} " \
+                             "WHERE {ModelSelectionSchema.MST_KEY}={k}".format(AutoMLSchema=AutoMLConstants,
+                                                                               ModelSelectionSchema=ModelSelectionSchema,
+                                                                               **locals()))[0][AutoMLConstants.LOSS_METRIC]
+
+                # avoid removing the two lines below (part of Hyperopt updates)
+                hyperopt_param['status'] = STATUS_OK
+                hyperopt_param['result'] = {'loss': loss_val, 'status': STATUS_OK}
+            trials.refresh()
+
+            # stacks info of all model configs together
+            self.update_model_output_and_info_tables(model_training)
+
+            self.print_best_mst_so_far()
+
+        self.end_training_time = get_current_timestamp(AutoMLConstants.TIME_FORMAT)
+        self.update_model_selection_table()
+        self.generate_model_output_summary_table(model_training)
+        self.remove_temp_tables(model_training)
+
+    def get_configs_list(self, num_configs, num_segments):
+        """
+        Gets schedule to evaluate model configs
+        :return: Model configs evaluation schedule
+        """
+        num_buckets = int(round(float(num_configs) / num_segments))
+        configs_list = []
+        start_idx = 1
+        models_populated = 0
+        for _ in range(num_buckets - 1):
+            end_idx = start_idx + num_segments
+            models_populated += num_segments
+            configs_list.append((start_idx, end_idx - 1))
+            start_idx = end_idx
+
+        remaining_models = num_configs - models_populated
+        configs_list.append((start_idx, start_idx + remaining_models-1))
+
+        return configs_list
+
+    def get_search_space(self):
+        """
+        Converts user inputs to hyperopt search space.
+        :return: Hyperopt search space
+        """
+
+        # initial params (outside 'optimizer_params_list')
+        hyperopt_search_dict = {}
+        hyperopt_search_dict['model_id'] = self.get_hyperopt_exps('model_id', self.model_id_list)
+
+
+        for j in self.fit_params_grid:
+            hyperopt_search_dict[j] = self.get_hyperopt_exps(j, self.fit_params_grid[j])
+
+        for i in self.compile_params_grid:
+            if i != ModelSelectionSchema.OPTIMIZER_PARAMS_LIST:
+                hyperopt_search_dict[i] = self.get_hyperopt_exps(i, self.compile_params_grid[i])
+
+        hyperopt_search_space_lst = []
+
+        counter = 1 # for unique names to allow multiple distribution options for optimizer params
+        for optimizer_dict in self.compile_params_grid[ModelSelectionSchema.OPTIMIZER_PARAMS_LIST]:
+            for o_param in optimizer_dict:
+                name = o_param + '_' + str(counter)
+                hyperopt_search_dict[name] = self.get_hyperopt_exps(name, optimizer_dict[o_param])
+            # appending deep copy
+            hyperopt_search_space_lst.append({k:v for k, v in hyperopt_search_dict.items()})
+            for o_param in optimizer_dict:
+                name = o_param + '_' + str(counter)
+                del hyperopt_search_dict[name]
+            counter += 1
+
+        return hp.choice('space', hyperopt_search_space_lst)
+
+    def get_hyperopt_exps(self, cp, param_value_list):
+        """
+        Samples a value from a given list of values, either randomly from a list of discrete elements,
+        or from a specified distribution.
+        :param cp: compile param
+        :param param_value_list: list of values (or specified distribution) for a param
+        :return: sampled value
+        """
+        # check if need to sample from a distribution
+        if type(param_value_list[-1]) == str and all([type(i) != str and not callable(i) for i in param_value_list[:-1]]) \
+                and len(param_value_list) > 1:
+            _assert_equal(len(param_value_list), 3,
+                          "{0}: '{1}' should have exactly 3 elements if picking from a distribution".format(self.module_name, cp))
+            _assert(param_value_list[1] > param_value_list[0],
+                    "{0}: '{1}' should be of the format [lower_bound, upper_bound, distribution_type]".format(self.module_name, cp))
+            if param_value_list[-1] == 'linear':
+                return hp.uniform(cp, param_value_list[0], param_value_list[1])
+            elif param_value_list[-1] == 'log':
+                return hp.loguniform(cp, np.log(param_value_list[0]), np.log(param_value_list[1]))
+            else:
+                plpy.error("{0}: Please choose a valid distribution type for '{1}': {2}".format(
+                    self.module_name,
+                    self.original_param_details(cp)[0],
+                    ['linear', 'log']))
+        else:
+            # random sampling
+            return hp.choice(cp, param_value_list)
+
+    def extract_param_vals(self, sampled_params):
+        """
+        Extract parameter values from hyperopt search space.
+        :param sampled_params: params suggested by hyperopt.
+        :return: lists of model ids, compile and fit params.
+        """
+        model_id_list, compile_params, fit_params = [], [], []
+        for params_dict in sampled_params:
+            compile_dict, fit_dict, optimizer_params_dict = {}, {}, {}
+            for p in params_dict:
+                if len(params_dict[p]) == 0 or p == 'space':
+                    continue
+                val = params_dict[p][0]
+                if p == 'model_id':
+                    model_id_list.append(self.model_id_list[val])
+                    continue
+                elif p in self.fit_params_grid:
+                    try:
+                        # check if params_dict[p] is an index
+                        fit_dict[p] = self.fit_params_grid[p][val]
+                    except TypeError:
+                        fit_dict[p] = params_dict[p]
+                elif p in self.compile_params_grid:
+                    try:
+                        # check if params_dict[p] is an index
+                        compile_dict[p] = self.compile_params_grid[p][val]
+                    except TypeError:
+                        compile_dict[p] = val
+                else:
+                    o_param, idx = self.original_param_details(p) # extracting unique attribute
+                    try:
+                        # check if params_dict[p] is an index (i.e. optimizer, for example)
+                        optimizer_params_dict[o_param] = self.compile_params_grid[
+                            ModelSelectionSchema.OPTIMIZER_PARAMS_LIST][idx][o_param][val]
+                    except TypeError:
+                        optimizer_params_dict[o_param] = val
+            compile_dict[ModelSelectionSchema.OPTIMIZER_PARAMS_LIST] = optimizer_params_dict
+
+            compile_params.append(compile_dict)
+            fit_params.append(fit_dict)
+
+        return model_id_list, compile_params, fit_params
+
+    def original_param_details(self, name):
+        """
+        Returns the original param name and book-keeping detail.
+        :param name: name of the param (example - lr_1, epsilon_12)
+        :return: original param name and book-keeping position.
+        """
+        parts = name.split('_')
+        return '_'.join(parts[:-1]), int(parts[-1]) - 1
+
+
+    def generate_msts(self, model_id_list, compile_params, fit_params):
+        """
+        Generates msts to insert in the mst table.
+        :param model_id_list: list of model ids
+        :param compile_params: list compile params
+        :param fit_params:list of fit params
+        :return: List of msts to insert in the mst table.
+        """
+        assert len(model_id_list) == len(compile_params) == len(fit_params)
+        msts = []
+
+        for i in range(len(compile_params)):
+            combination = {}
+            combination[ModelSelectionSchema.MODEL_ID] = model_id_list[i]
+            combination[ModelSelectionSchema.COMPILE_PARAMS] = generate_row_string(compile_params[i])
+            combination[ModelSelectionSchema.FIT_PARAMS] = generate_row_string(fit_params[i])
+            msts.append(combination)
+
+        return msts
+
+    def populate_temp_mst_tables(self, i, msts_list):
+        """
+        Creates and populates temp mst and summary tables with newly suggested model configs for evaluation.
+        :param i: mst key number
+        :param msts_list: list of generated msts.
+        """
+        # extra sanity check
+        if table_exists(AutoMLConstants.TEMP_MST_TABLE):
+            drop_tables([AutoMLConstants.TEMP_MST_TABLE])
+
+        create_query = """
+                        CREATE TABLE {AutoMLSchema.TEMP_MST_TABLE} (
+                            {mst_key} INTEGER,
+                            {model_id} INTEGER,
+                            {compile_params} VARCHAR,
+                            {fit_params} VARCHAR,
+                            unique ({model_id}, {compile_params}, {fit_params})
+                        );
+                       """.format(AutoMLSchema=AutoMLConstants,
+                                  mst_key=ModelSelectionSchema.MST_KEY,
+                                  model_id=ModelSelectionSchema.MODEL_ID,
+                                  compile_params=ModelSelectionSchema.COMPILE_PARAMS,
+                                  fit_params=ModelSelectionSchema.FIT_PARAMS)
+        plpy.execute(create_query)
+        mst_key_val = i
+        for mst in msts_list:
+            model_id = mst[ModelSelectionSchema.MODEL_ID]
+            compile_params = mst[ModelSelectionSchema.COMPILE_PARAMS]
+            fit_params = mst[ModelSelectionSchema.FIT_PARAMS]
+            insert_query = """
+                            INSERT INTO
+                                {AutoMLSchema.TEMP_MST_TABLE}(
+                                    {mst_key_col},
+                                    {model_id_col},
+                                    {compile_params_col},
+                                    {fit_params_col}
+                                )
+                            VALUES (
+                                {mst_key_val},
+                                {model_id},
+                                $${compile_params}$$,
+                                $${fit_params}$$
+                            )
+                           """.format(mst_key_col=ModelSelectionSchema.MST_KEY,
+                                      model_id_col=ModelSelectionSchema.MODEL_ID,
+                                      compile_params_col=ModelSelectionSchema.COMPILE_PARAMS,
+                                      fit_params_col=ModelSelectionSchema.FIT_PARAMS,
+                                      AutoMLSchema=AutoMLConstants,
+                                      **locals())
+            mst_key_val += 1
+            plpy.execute(insert_query)
+
+        self.generate_mst_summary_table(AutoMLConstants.TEMP_MST_SUMMARY_TABLE)
+
+    def generate_mst_summary_table(self, tbl_name):
+        """
+        generates mst summary table with the given name
+        :param tbl_name: name of summary table
+        """
+        _assert(tbl_name.endswith('_summary'), 'invalid summary table name')
+
+        # extra sanity check
+        if table_exists(tbl_name):
+            drop_tables([tbl_name])
+
+        create_query = """
+                        CREATE TABLE {tbl_name} (
+                            {model_arch_table} VARCHAR,
+                            {object_table} VARCHAR
+                        );
+                       """.format(tbl_name=tbl_name,
+                                  model_arch_table=ModelSelectionSchema.MODEL_ARCH_TABLE,
+                                  object_table=ModelSelectionSchema.OBJECT_TABLE)
+        plpy.execute(create_query)
+
+        if self.object_table is None:
+            object_table = 'NULL::VARCHAR'
+        else:
+            object_table = '$${0}$$'.format(self.object_table)
+        insert_summary_query = """
+                        INSERT INTO
+                            {tbl_name}(
+                                {model_arch_table_name},
+                                {object_table_name}
+                        )
+                        VALUES (
+                            $${self.model_arch_table}$$,
+                            {object_table}
+                        )
+                       """.format(model_arch_table_name=ModelSelectionSchema.MODEL_ARCH_TABLE,
+                                  object_table_name=ModelSelectionSchema.OBJECT_TABLE,
+                                  **locals())
+        plpy.execute(insert_summary_query)
+
+    def update_model_output_and_info_tables(self, model_training):
+        """
+        Updates model output and info tables by stacking rows after each evaluation round.
+        :param model_training: Fit Multiple class object
+        """
+        metrics_iters = plpy.execute("SELECT {AutoMLSchema.METRICS_ITERS} " \
+                                     "FROM {model_training.original_model_output_table}_summary".format(self=self,
+                                                                                                        model_training=model_training,
+                                                                                                        AutoMLSchema=AutoMLConstants))[0][AutoMLConstants.METRICS_ITERS]
+        if metrics_iters:
+            metrics_iters = "ARRAY{0}".format(metrics_iters)
+        # stacking new rows from training
+        plpy.execute("INSERT INTO {self.model_output_table} SELECT * FROM " \
+                     "{model_training.original_model_output_table}".format(self=self, model_training=model_training))
+        plpy.execute("INSERT INTO {self.model_info_table} SELECT *, {metrics_iters} FROM " \
+                     "{model_training.model_info_table}".format(self=self,
+                                                                     model_training=model_training,
+                                                                     metrics_iters=metrics_iters))
diff --git a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
index 946dde3..9db4ea1 100644
--- a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
+++ b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras_automl.py_in
@@ -18,11 +18,11 @@
 # under the License.
 
 m4_changequote(`<!', `!>')
+m4_ifdef(<!__POSTGRESQL__!>, <!print 'skipping automl for postgres'!>, <!
 
 import sys
 from os import path
 import math
-# Add convex module to the pythonpath. # TODO: ?
 sys.path.append(path.dirname(path.dirname(path.dirname(path.dirname(path.abspath(__file__))))))
 sys.path.append(path.dirname(path.dirname(path.dirname(path.abspath(__file__)))))
 
@@ -46,8 +46,8 @@ class HyperbandScheduleTestCase(unittest.TestCase):
 
         self.module_patcher = patch.dict('sys.modules', patches)
         self.module_patcher.start()
-        import deep_learning.madlib_keras_automl
-        self.module = deep_learning.madlib_keras_automl
+        import deep_learning.madlib_keras_automl_hyperband
+        self.module = deep_learning.madlib_keras_automl_hyperband
         # self.module.MstLoaderInputValidator._validate_input_args = \
         #     MagicMock()
 
@@ -222,13 +222,13 @@ class AutoMLHyperoptTestCase(unittest.TestCase):
 
         self.module_patcher = patch.dict('sys.modules', patches)
         self.module_patcher.start()
-        import deep_learning.madlib_keras_automl
-        self.module = deep_learning.madlib_keras_automl
+        import deep_learning.madlib_keras_automl_hyperopt
+        self.module = deep_learning.madlib_keras_automl_hyperopt
 
-        from deep_learning.madlib_keras_automl import AutoMLHyperopt
+        # from deep_learning.madlib_keras_automl_hyperopt import AutoMLHyperopt
         self.seg_num_mock = Mock()
 
-        class FakeAutoMLHyperopt(AutoMLHyperopt):
+        class FakeAutoMLHyperopt(self.module.AutoMLHyperopt):
             def __init__(self, *args):
                 pass
             self.module.get_seg_number = self.seg_num_mock
@@ -282,3 +282,5 @@ class AutoMLHyperoptTestCase(unittest.TestCase):
 
 if __name__ == '__main__':
     unittest.main()
+
+!>)

[madlib] 06/08: Split`with` for multiple expressions into nested calls

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 30db0e6e28a25d29c4673a12bbe434df9b5b7ad0
Author: Ekta Khanna <ek...@vmware.com>
AuthorDate: Mon Oct 19 13:33:25 2020 -0700

    Split`with` for multiple expressions into nested calls
    
    When calling multiple expressions in a `with` statement using `and`,
    only the last expression gets executed. In order to ensure all
    expressions are executed, we can either use a `,` between the
    expressions or call individual expressions using nested `with`
    statements. Since `,` is not supported in Python versions < 2.7,
    updating code to use nested `with` statement.
    
    Co-authored-by: Nikhil Kak <nk...@vmware.com>
---
 .../deep_learning/madlib_keras_automl.sql_in       | 17 ++++++++------
 .../deep_learning/madlib_keras_gpu_info.sql_in     |  6 ++---
 src/ports/postgres/modules/lda/lda.sql_in          | 27 +++++++++++++---------
 .../modules/utilities/text_utilities.sql_in        |  5 ++--
 4 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
index 113ec16..dc5cc6e 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
@@ -626,9 +626,10 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.hyperband_schedule(
       skip_last             INTEGER DEFAULT 0
 ) RETURNS VOID AS $$
     PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl_hyperband')
-    with AOControl(False) and MinWarning('warning'):
-        schedule_loader = madlib_keras_automl_hyperband.HyperbandSchedule(schedule_table, r, eta, skip_last)
-        schedule_loader.load()
+    with AOControl(False):
+        with MinWarning('warning'):
+            schedule_loader = madlib_keras_automl_hyperband.HyperbandSchedule(schedule_table, r, eta, skip_last)
+            schedule_loader.load()
 $$ LANGUAGE plpythonu VOLATILE
               m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
 
@@ -652,12 +653,14 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.madlib_keras_automl(
 ) RETURNS VOID AS $$
 if automl_method is None or automl_method.lower() == 'hyperband':
     PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl_hyperband')
-    with AOControl(False) and MinWarning('warning'):
-        schedule_loader = madlib_keras_automl_hyperband.AutoMLHyperband(**globals())
+    with AOControl(False):
+        with MinWarning('warning'):
+            schedule_loader = madlib_keras_automl_hyperband.AutoMLHyperband(**globals())
 elif automl_method.lower() == 'hyperopt':
     PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl_hyperopt')
-    with AOControl(False) and MinWarning('warning'):
-        schedule_loader = madlib_keras_automl_hyperopt.AutoMLHyperopt(**globals())
+    with AOControl(False):
+        with MinWarning('warning'):
+            schedule_loader = madlib_keras_automl_hyperopt.AutoMLHyperopt(**globals())
 else:
     plpy.error("madlib_keras_automl: The chosen automl method must be 'hyperband' or 'hyperopt'")
 $$ LANGUAGE plpythonu VOLATILE
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_gpu_info.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_gpu_info.sql_in
index d2418e4..66cbcc2 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_gpu_info.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_gpu_info.sql_in
@@ -256,9 +256,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.gpu_configuration(output_table text, so
 RETURNS VOID AS
 $$
     PythonFunctionBodyOnly(`deep_learning', `madlib_keras_gpu_info')
-    from utilities.control import MinWarning
-    with AOControl(False) and MinWarning("error"):
-        madlib_keras_gpu_info.gpu_configuration(schema_madlib, output_table, source)
+    with AOControl(False):
+        with MinWarning("error"):
+            madlib_keras_gpu_info.gpu_configuration(schema_madlib, output_table, source)
 $$
 LANGUAGE plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `NO SQL', `');
diff --git a/src/ports/postgres/modules/lda/lda.sql_in b/src/ports/postgres/modules/lda/lda.sql_in
index 814b0ae..32b22fd 100644
--- a/src/ports/postgres/modules/lda/lda.sql_in
+++ b/src/ports/postgres/modules/lda/lda.sql_in
@@ -1057,9 +1057,10 @@ MADLIB_SCHEMA.lda_train
 RETURNS SETOF MADLIB_SCHEMA.lda_result AS $$
     PythonFunctionBodyOnly(`lda', `lda')
     from utilities.control import MinWarning
-    with AOControl(False) and MinWarning("error"):
-        lda.lda_train(schema_madlib, data_table, model_table, output_data_table,
-                      voc_size, topic_num, iter_num, alpha, beta, None, None)
+    with AOControl(False):
+        with MinWarning("error"):
+            lda.lda_train(schema_madlib, data_table, model_table, output_data_table,
+                        voc_size, topic_num, iter_num, alpha, beta, None, None)
     return [[model_table, 'model table'],
         [output_data_table, 'output data table']]
 $$ LANGUAGE plpythonu
@@ -1135,8 +1136,9 @@ MADLIB_SCHEMA.lda_predict
 RETURNS SETOF MADLIB_SCHEMA.lda_result AS $$
     PythonFunctionBodyOnly(`lda', `lda')
     from utilities.control import MinWarning
-    with AOControl(False) and MinWarning("error"):
-        lda.lda_predict(schema_madlib, data_table, model_table, output_table)
+    with AOControl(False):
+        with MinWarning("error"):
+            lda.lda_predict(schema_madlib, data_table, model_table, output_table)
     return [[
         output_table,
         'per-doc topic distribution and per-word topic assignments']]
@@ -1197,8 +1199,9 @@ MADLIB_SCHEMA.lda_get_word_topic_count
 RETURNS SETOF MADLIB_SCHEMA.lda_result AS $$
     PythonFunctionBodyOnly(`lda', `lda')
     from utilities.control import MinWarning
-    with AOControl(False) and MinWarning("error"):
-        lda.get_word_topic_count(schema_madlib, model_table, output_table)
+    with AOControl(False):
+        with MinWarning("error"):
+            lda.get_word_topic_count(schema_madlib, model_table, output_table)
     return [[output_table, 'per-word topic counts']]
 $$ LANGUAGE plpythonu STRICT
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
@@ -1221,8 +1224,9 @@ MADLIB_SCHEMA.lda_get_topic_desc
 RETURNS SETOF MADLIB_SCHEMA.lda_result AS $$
     PythonFunctionBodyOnly(`lda', `lda')
     from utilities.control import MinWarning
-    with AOControl(False) and MinWarning("error"):
-        lda.get_topic_desc(schema_madlib, model_table, vocab_table, desc_table, top_k)
+    with AOControl(False):
+        with MinWarning("error"):
+            lda.get_topic_desc(schema_madlib, model_table, vocab_table, desc_table, top_k)
     return [[
         desc_table,
         """topic description, use "ORDER BY topicid, prob DESC" to check the
@@ -1244,8 +1248,9 @@ MADLIB_SCHEMA.lda_get_word_topic_mapping
 RETURNS SETOF MADLIB_SCHEMA.lda_result AS $$
     PythonFunctionBodyOnly(`lda', `lda')
     from utilities.control import MinWarning
-    with AOControl(False) and MinWarning("error"):
-        lda.get_word_topic_mapping(schema_madlib, lda_output_table, mapping_table)
+    with AOControl(False):
+        with MinWarning("error"):
+            lda.get_word_topic_mapping(schema_madlib, lda_output_table, mapping_table)
     return [[mapping_table, 'wordid - topicid mapping']]
 $$ LANGUAGE plpythonu STRICT
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
diff --git a/src/ports/postgres/modules/utilities/text_utilities.sql_in b/src/ports/postgres/modules/utilities/text_utilities.sql_in
index 478e751..c48b206 100644
--- a/src/ports/postgres/modules/utilities/text_utilities.sql_in
+++ b/src/ports/postgres/modules/utilities/text_utilities.sql_in
@@ -325,8 +325,9 @@ RETURNS TEXT
 AS $$
     PythonFunctionBodyOnly(`utilities', `text_utilities')
     from utilities.control import MinWarning
-    with AOControl(False) and MinWarning("error"):
-        return text_utilities.term_frequency(input_table, doc_id_col, word_vec_col,
+    with AOControl(False):
+        with MinWarning("error"):
+            return text_utilities.term_frequency(input_table, doc_id_col, word_vec_col,
                                              output_table, compute_vocab=compute_vocab)
 $$
 LANGUAGE plpythonu

[madlib] 03/08: Add MinWarning to remove extraneous INFO messages

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 88f2ebc717562a86e69fb197acfa91e963babb70
Author: Domino Valdano <dv...@vmware.com>
AuthorDate: Mon Sep 28 17:17:16 2020 -0700

    Add MinWarning to remove extraneous INFO messages
    
    JIRA: MADLIB-1453
    
    We might also want to add it to PythonFunction macro, and
    automatically wrap every function call with:
     with MinWarning('warning')
    This is what we do for AOControl, which makes sense to me.  But
    before enabling that we would need to check to make sure nothing
    in MADlib is counting on INFO statements showing up by default.
    For now, I'm just importing it by default so that it can easily
    be used along with AOControl in the .sql_in files.  This is much
    better than putting decorators on all of our functions in the
    .py_in files, as the latter will keep turning this GUC on and
    off every time a function is entered or exited.
---
 src/ports/postgres/madpack/SQLCommon.m4_in                    |  2 +-
 .../postgres/modules/deep_learning/madlib_keras_automl.py_in  | 11 +----------
 .../postgres/modules/deep_learning/madlib_keras_automl.sql_in |  4 ++--
 3 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/src/ports/postgres/madpack/SQLCommon.m4_in b/src/ports/postgres/madpack/SQLCommon.m4_in
index ffc0c37..cc58ea2 100644
--- a/src/ports/postgres/madpack/SQLCommon.m4_in
+++ b/src/ports/postgres/madpack/SQLCommon.m4_in
@@ -82,7 +82,7 @@ m4_define(<!PythonFunctionBodyOnly!>, <!
 
     global schema_madlib
     schema_madlib = rv[0]['nspname']
-    from utilities.control import AOControl
+    from utilities.control import AOControl,MinWarning
 !>)
 
 /*
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
index d6eeba3..0df6772 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
@@ -30,7 +30,7 @@ from madlib_keras_validator import MstLoaderInputValidator
 # from utilities.admin import cleanup_madlib_temp_tables
 from utilities.utilities import get_current_timestamp, get_seg_number, get_segments_per_host, \
     unique_string, add_postfix, extract_keyvalue_params, _assert, _assert_equal, rename_table
-from utilities.control import MinWarning, SetGUC
+from utilities.control import SetGUC
 from madlib_keras_fit_multiple_model import FitMultipleModel
 from madlib_keras_helper import generate_row_string
 from madlib_keras_helper import DISTRIBUTION_RULES
@@ -61,7 +61,6 @@ class AutoMLConstants:
     INT_MAX = 2 ** 31 - 1
     TARGET_SCHEMA = 'public'
 
-@MinWarning("warning")
 class HyperbandSchedule():
     """The utility class for loading a hyperband schedule table with algorithm inputs.
 
@@ -171,7 +170,6 @@ class HyperbandSchedule():
                                       **locals())
             plpy.execute(insert_query)
 
-# @MinWarning("warning")
 class KerasAutoML(object):
     """
     The core AutoML class for running AutoML algorithms such as Hyperband and Hyperopt.
@@ -234,7 +232,6 @@ class KerasAutoML(object):
                                      {ModelArchSchema.MODEL_ARCH} JSON)
                                     """.format(self=self, ModelSelectionSchema=ModelSelectionSchema,
                                                ModelArchSchema=ModelArchSchema)
-        # with MinWarning('warning'):
         plpy.execute(output_table_create_query)
 
     def create_model_output_info_table(self):
@@ -329,7 +326,6 @@ class KerasAutoML(object):
                        descr=descr,
                        model_training=model_training))
 
-        # with MinWarning('warning'):
         plpy.execute(create_query)
 
     def is_automl_method(self, method_name):
@@ -389,7 +385,6 @@ class KerasAutoML(object):
                      model_training.model_summary_table, AutoMLConstants.TEMP_MST_TABLE,
                      AutoMLConstants.TEMP_MST_SUMMARY_TABLE])
 
-# @MinWarning("warning")
 class AutoMLHyperband(KerasAutoML):
     """
     This class implements Hyperband, an infinite-arm bandit based algorithm that speeds up random search
@@ -563,7 +558,6 @@ class AutoMLHyperband(KerasAutoML):
                                   model_id=ModelSelectionSchema.MODEL_ID,
                                   compile_params=ModelSelectionSchema.COMPILE_PARAMS,
                                   fit_params=ModelSelectionSchema.FIT_PARAMS)
-        # with MinWarning('warning'):
         plpy.execute(create_query)
 
         query = ""
@@ -667,7 +661,6 @@ class AutoMLHyperband(KerasAutoML):
                 "b (key integer, s_val integer, i_val integer) WHERE t.mst_key=b.key".format(self=self, l=l)
         plpy.execute(query)
 
-# @MinWarning("warning")
 class AutoMLHyperopt(KerasAutoML):
     """
     This class implements Hyperopt, another automl method that explores awkward search spaces using
@@ -1000,7 +993,6 @@ class AutoMLHyperopt(KerasAutoML):
                                   model_id=ModelSelectionSchema.MODEL_ID,
                                   compile_params=ModelSelectionSchema.COMPILE_PARAMS,
                                   fit_params=ModelSelectionSchema.FIT_PARAMS)
-        # with MinWarning('warning'):
         plpy.execute(create_query)
         mst_key_val = i
         for mst in msts_list:
@@ -1051,7 +1043,6 @@ class AutoMLHyperopt(KerasAutoML):
                        """.format(tbl_name=tbl_name,
                                   model_arch_table=ModelSelectionSchema.MODEL_ARCH_TABLE,
                                   object_table=ModelSelectionSchema.OBJECT_TABLE)
-        # with MinWarning('warning'):
         plpy.execute(create_query)
 
         if self.object_table is None:
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
index 98617d7..a5c7507 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
@@ -626,7 +626,7 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.hyperband_schedule(
       skip_last             INTEGER DEFAULT 0
 ) RETURNS VOID AS $$
     PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl')
-    with AOControl(False):
+    with AOControl(False) and MinWarning('warning'):
         schedule_loader = madlib_keras_automl.HyperbandSchedule(schedule_table, r, eta, skip_last)
         schedule_loader.load()
 $$ LANGUAGE plpythonu VOLATILE
@@ -651,7 +651,7 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.madlib_keras_automl(
     description                    VARCHAR DEFAULT NULL
 ) RETURNS VOID AS $$
     PythonFunctionBodyOnly(`deep_learning', `madlib_keras_automl')
-    with AOControl(False):
+    with AOControl(False) and MinWarning('warning'):
         if automl_method is None or automl_method.lower() == 'hyperband':
             schedule_loader = madlib_keras_automl.AutoMLHyperband(**globals())
         elif automl_method.lower() == 'hyperopt':

[madlib] 01/08: DL: Update test sql to reference to tablename created in the same file

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 6fd6d65dde0af4a54b4718c2a6c34a0948b05a73
Author: Ekta Khanna <ek...@vmware.com>
AuthorDate: Tue Oct 27 11:24:53 2020 -0700

    DL: Update test sql to reference to tablename created in the same file
    
    Prior to this commit, the tests for restricted access to custom function
    table were referencing to a tablename not intended for. The tablename
    refereced in the test was actually created in a different test file
    which masked the failure. This commit updates the tests to reference to
    the actually intended table created in the same file.
---
 .../deep_learning/test/madlib_keras_custom_function.sql_in        | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/test/madlib_keras_custom_function.sql_in b/src/ports/postgres/modules/deep_learning/test/madlib_keras_custom_function.sql_in
index d9a323a..a08f818 100644
--- a/src/ports/postgres/modules/deep_learning/test/madlib_keras_custom_function.sql_in
+++ b/src/ports/postgres/modules/deep_learning/test/madlib_keras_custom_function.sql_in
@@ -160,16 +160,16 @@ SELECT assert(name = 'top_8_accuracy', 'Top 8 accuracy name is incorrect')
 FROM __test_custom_function_table__ WHERE id = 4;
 
 CREATE SCHEMA MADLIB_SCHEMA_aaa;
-CREATE TABLE pg_temp.temp1 AS SELECT * FROM MADLIB_SCHEMA.test_custom_function_table;
-CREATE TABLE pg_temp.MADLIB_SCHEMA AS SELECT * FROM MADLIB_SCHEMA.test_custom_function_table;
-CREATE TABLE MADLIB_SCHEMA_aaa.test_table AS SELECT * FROM MADLIB_SCHEMA.test_custom_function_table;
+CREATE TABLE pg_temp.temp1 AS SELECT * FROM MADLIB_SCHEMA.__test_custom_function_table__;
+CREATE TABLE pg_temp.MADLIB_SCHEMA AS SELECT * FROM MADLIB_SCHEMA.__test_custom_function_table__;
+CREATE TABLE MADLIB_SCHEMA_aaa.test_table AS SELECT * FROM MADLIB_SCHEMA.__test_custom_function_table__;
 
 SELECT assert(MADLIB_SCHEMA.trap_error($$
   SELECT load_custom_function('pg_temp.temp1', custom_function_object(), 'sum_fn', 'returns sum');
 $$) = 1, 'Cannot use non-madlib schemas');
 
 SELECT assert(MADLIB_SCHEMA.trap_error($$
-  SELECT load_custom_function('test_custom_function_table UNION pg_temp.temp1',
+  SELECT load_custom_function('__test_custom_function_table__ UNION pg_temp.temp1',
     custom_function_object(), 'sum_fn', 'returns sum');
 $$) = 1, 'UNION should not pass');

[madlib] 08/08: additional user docs updates about installing Dill and Hyperopt

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 33c1a6d3a534183952f575dc08365cd822c4837b
Author: Frank McQuillan <fm...@pivotal.io>
AuthorDate: Mon Oct 26 12:56:26 2020 -0700

    additional user docs updates about installing Dill and Hyperopt
---
 .../modules/deep_learning/madlib_keras_automl.sql_in     | 16 ++++++++++++++--
 .../deep_learning/madlib_keras_custom_function.sql_in    |  6 +++++-
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
index f703cf0..f692f13 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
@@ -183,7 +183,12 @@ madlib_keras_automl(
 
   <dt>automl_method (optional)</dt>
   <dd>VARCHAR, default 'hyperband'. Name of the autoML algorithm to run.
-  Can be either 'hyperband' or hyperopt' (case insensitive).
+  Can be either 'hyperband' or 'hyperopt' (case insensitive).
+
+  @note
+  If you select 'hyperopt', then the Hyperopt package must be installed on the main node
+  of the database cluster [3]. Hyperband does not need any separate package installation.
+
   </dd>
 
   <dt>automl_params (optional)</dt>
@@ -1313,7 +1318,12 @@ SELECT * FROM iris_predict ORDER BY id;
 @anchor notes
 @par Notes
 
-In practice you may need to do more than one run of an autoML method to arrive
+1. Hyperopt must be installed on the main node of the database cluster
+if you want to use the Hyperopt method of autoML.
+You can pip install it in the usual way [3].  Hyperband does not require
+any separate package installation.
+
+2. In practice you may need to do more than one run of an autoML method to arrive
 at a model with adequate accuracy.  One approach is to set the search space to
 be quite broad initially, then observe which hyperparameter ranges and model architectures
 seem to be doing the best.  Subesquent runs can then zoom in on those good ones
@@ -1330,6 +1340,8 @@ Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures,"
 <em>Proceedings of the 30th International Conference on Machine Learning</em>, Atlanta, Georgia,
 USA, 2013. JMLR: W&CP volume 28.
 
+[3] Python catalog for Hyperopt https://pypi.org/project/hyperopt/
+
 @anchor related
 @par Related Topics
 
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_custom_function.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_custom_function.sql_in
index 43f6afc..440d814 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_custom_function.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_custom_function.sql_in
@@ -83,6 +83,10 @@ load_custom_function(
   <dd>BYTEA. PostgreSQL binary data type of the Python object.
   Object must be created with the Dill package for serializing
   Python objects.
+
+  @note
+  The Dill package must be installed on all segments of the
+  database cluster [1].
   </dd>
 
   <dt>name</dt>
@@ -318,7 +322,7 @@ SELECT id, name, description FROM custom_function_table ORDER BY id;
 @anchor literature
 @literature
 
-[1] Dill https://pypi.org/project/dill/
+[1] Python catalog for Dill package https://pypi.org/project/dill/
 
 [2] https://keras.io/api/metrics/accuracy_metrics/#topkcategoricalaccuracy-class

[madlib] 07/08: user docs and examples for automl

Posted by kh...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

khannaekta pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 6976c9f54251d0553148f0ebb7e3afaf432c6771
Author: Frank McQuillan <fm...@pivotal.io>
AuthorDate: Fri Oct 23 17:42:01 2020 -0700

    user docs and examples for automl
---
 .../deep_learning/madlib_keras_automl.sql_in       | 906 ++++++++++++++++++---
 1 file changed, 812 insertions(+), 94 deletions(-)

diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
index dc5cc6e..f703cf0 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_automl.sql_in
@@ -20,7 +20,7 @@
  *
  * @file madlib_keras_automl.sql_in
  *
- * @brief SQL functions for training with AutoML methods
+ * @brief SQL functions for training with autoML methods
  * @date August 2020
  *
  *
@@ -33,29 +33,44 @@ m4_include(`SQLCommon.m4')
 @addtogroup grp_automl
 
 
-@brief Functions to run automated machine learning (AutoML) algorithms to automate
-and speed-up the model selection and training processes for model architecture search,
-hyperparameter tuning, and model evaluation.
+@brief Functions to run automated machine learning (autoML) methods for
+model architecture search and hyperparameter tuning.
 
 \warning <em> This MADlib method is still in early stage development.
 Interface and implementation are subject to change. </em>
 
 <div class="toc"><b>Contents</b><ul>
 <li class="level1"><a href="#madlib_keras_automl">AutoML Function</a></li>
-<li class="level1"><a href="#hyperband_schedule">Hyperband Schedule</a></li>
+<li class="level1"><a href="#hyperband_schedule">Print Hyperband Schedule</a></li>
 <li class="level1"><a href="#example">Examples</a></li>
 <li class="level1"><a href="#notes">Notes</a></li>
+<li class="level1"><a href="#literature">Literature</a></li>
 <li class="level1"><a href="#related">Related Topics</a></li>
 </ul></div>
 
-This module sets up the AutoML algorithms to automate and accelerate
-the model selection and training processes, involving hyperparameter optimization,
-model architecture search, and model training.
+This module contains automated machine learning (autoML) methods for
+model architecture search and hyperparameter tuning.  The goal of autoML when
+training deep nets is to reduce the amount of hand-tuning by data scientists
+to produce a model of acceptable accuracy, compared to manual
+methods like grid or random search.  The two autoML methods implemented
+here are Hyperband and Hyperopt.  If you want to use grid or random search,
+please refer to <a href="group__grp__keras__setup__model__selection.html">Generate
+Model Configurations</a>.
 
-The module also has a a utility function for viewing the Hyperband schedule of
-evaluating configurations for use by the Keras AutoML of MADlib.
-By configuration we mean both hyperparameter tuning and
-model architecture search.
+Hyperband is an effective model selection algorithm that utilizes the idea
+of successive halving. It accelerates random search through adaptive resource allocation
+and early stopping [1].  The implementation here is designed to
+keep MPP database cluster resources as busy as possible when executing
+the Hyperband schedule.
+
+There is also a utility function for printing out the Hyperband schedule
+for a given set of input parameters, to give you
+a sense of how long a run might take before starting.
+
+Hyperopt is meta-modeling approach for automated hyperparameter optimization [2].
+It intelligently explores the search space while narrowing down to the best
+estimated parameters.  Within Hyperopt we support random search and Tree
+of Parzen Estimators (TPE) approach.
 
 @anchor madlib_keras_automl
 @par AutoML
@@ -88,26 +103,26 @@ madlib_keras_automl(
   This is the name of the output table from the image preprocessor. Independent
   and dependent variables are specified in the preprocessor
   step which is why you do not need to explictly state
-  them here as part of the fit function. The configurations would be evaluated on the basis the training loss,
-  unless a validation table is specified below.
+  them here. Configurations will be evaluated by the autoML methods on the basis of training loss,
+  unless a validation table is specified below, in which case validation loss will be used.
   </dd>
 
   <dt>model_output_table</dt>
   <dd>TEXT. Name of the output table containing the
   multiple models created.
-  @note pg_temp is not allowed as an output table schema for fit multiple.
+  @note 'pg_temp' is not allowed as an output table schema.
   Details of output tables are shown below.
   </dd>
 
   <dt>model_arch_table</dt>
   <dd>VARCHAR. Table containing model architectures and weights.
   For more information on this table
-  refer to <a href="group__grp__keras__model__arch.html">Load Model</a>.
+  refer to <a href="group__grp__keras__model__arch.html">Load Models</a>.
   </dd>
 
   <dt>model_selection_table</dt>
-  <dd>VARCHAR. Model selection table created by this utility.  A summary table
-  named <model_selection_table>_summary is also created.  Contents of both output
+  <dd>VARCHAR. Model selection table created by this method.  A summary table
+  named <model_selection_table>_summary is also created.  Contents of both of these
   tables are described below.
   </dd>
 
@@ -115,7 +130,7 @@ madlib_keras_automl(
   <dd>INTEGER[]. Array of model IDs from the 'model_arch_table' to be included
   in the run combinations.  For hyperparameter search, this will typically be
   one model ID.  For model architecture search, this will be the different model IDs
-  that you want to test.
+  that you want to try.
   </dd>
 
   <dt>compile_params_grid</dt>
@@ -141,13 +156,13 @@ madlib_keras_automl(
   The following types of sampling are supported:  'linear', 'log' and 'log_near_one'.
   The 'log_near_one' sampling is useful for exponentially weighted average types of parameters like momentum,
   which are very sensitive to changes near 1.  It has the effect of producing more values near 1
-  than regular log-based sampling.
+  than regular log-based sampling. However, 'log_near_one' is only supported
+  for Hyperband, not for Hyperopt.
 
-  In the case of grid search, omit the sample type and just put the grid points in the list.
-  For custom loss functions or custom metrics,
-  list the custom function name in the usual way, and provide the name of the
+  For custom loss functions, metrics or top k categorical accuracy,
+  list the custom function name and provide the name of the
   table where the serialized Python objects reside using the
-  parameter 'object_table' below. See the examples section later on this page for more examples.
+  parameter 'object_table' below.
   </dd>
 
   <dt>fit_params_grid</dt>
@@ -164,47 +179,61 @@ madlib_keras_automl(
     }
   $$
   </pre>
-  See the examples section later on this page for more examples.
   </dd>
 
   <dt>automl_method (optional)</dt>
-  <dd>VARCHAR, default 'hyperband'. Name of the automl algorithm to run.
-  Can be either 'hyperband' or hyperopt'. Prefixing is not supported but arg value can be case insensitive.
+  <dd>VARCHAR, default 'hyperband'. Name of the autoML algorithm to run.
+  Can be either 'hyperband' or hyperopt' (case insensitive).
   </dd>
 
   <dt>automl_params (optional)</dt>
-  <dd>VARCHAR, default 'R=6, eta=3, skip_last=0' (for Hyperband). Parameters for the chosen automl method in a
-  comma-separated string of key-value pairs. For eg - 'num_configs=20, num_iterations=5, algorithm=tpe' for Hyperopt
-  Hyperband params are:
-  R - the maximum amount of resources/iterations allocated to a single configuration
-  in a round of hyperband,
-  eta - factor controlling the proportion of configurations discarded in each
-  round of successive halving,
-  skip_last - number of last diagonal brackets to skip running
-  in the algorithm.
-  We encourage setting an low R value (i.e. 2 to 10), or a high R value and a high skip_last value to evaluate
-  a variety of configurations with decent number of iterations. See the description below for details.
-  Hyperopt params are:
-  num_configs - total number of model configurations to evaluate,
-  num_iterations - fixed number of iterations for evaluating each model configurations,
-  algorithm - name of algorithm to explore search space in hyperopt ('rand', 'tpe', 'atpe').
-  </dd>
+  <dd>VARCHAR, default depends on the method. Parameters for the chosen autoML method in a
+  comma-separated string of key-value pairs.  Please refer to references [1] and [2] for
+  more details on the definition of these parameters.
 
-  <dt>random_state (optional)</dt>
-  <dd>INTEGER, default: NULL.  Pseudo random number generator
-  state used for random uniform sampling from lists of possible
-  values. Pass an integer to evaluate a fixed set of configurations.
+  <DL class="arglist">
+  <DT><i>Hyperband params:</i></dt><dd></dd>
+  <DT>R</dt>
+  <DD>Default: 6. Maximum amount of resources (i.e., iterations) to allocate to a single configuration
+  in a round of Hyperband.
+  </DD>
+  <DT>eta</DT>
+  <DD>Default: 3. Controls the proportion of configurations discarded in each
+  round of successive halving. For example, for eta=3 will keep the best 1/3 the
+  configurations for the next round.</DD>
+  <DT>skip_last</DT>
+  <DD>Default: 0. The number of last rounds to skip. For example, 'skip_last=1'
+  will skip the last round (i.e., last entry in each bracket), which is standard random
+  search and can be expensive when run for the total R iterations.</DD>
+  </DL>
 
-  @note
-    Specifying a random state doesn't not guarantee result reproducibility of the best configuration or the best
-    train/validation accuracy/loss. It guarantees that the same set of configurations will be chosen for evaluation.
+  <DL class="arglist">
+  <DT><i>Hyperopt params:</i></dt><dd></dd>
+  <DT>num_configs</dt>
+  <DD>Default: 20. Number of trials to evaluate.
+  </DD>
+  <DT>num_iterations</DT>
+  <DD>Default: 5. Number of iterations to run for each trial.</DD>
+  <DT>algorithm</DT>
+  <DD>Default: 'tpe'. Name of the algorithm to explore the search space in Hyperopt ('rand' or 'tpe').</DD>
+  </DL>
 
+  <dt>random_state (optional)</dt>
+  <dd>INTEGER, default NULL. Pseudo random number generator state used for random
+  uniform sampling from lists of possible values. Pass an integer to evaluate a fixed set of configurations.
+  </dd>
+  @note
+    Specifying a random state does not guarantee result reproducibility
+    of the best configuration or the best
+    train/validation accuracy/loss. It only guarantees that
+    the same set of configurations will be chosen for evaluation.
   </dd>
 
   <dt>object_table (optional)</dt>
   <dd>VARCHAR, default: NULL. Name of the table containing
-  Python objects in the case that custom loss functions or
-  custom metrics are specified in the 'compile_params_grid'.
+  Python objects in the case that custom loss functions,
+  metrics or top k categorical accuracy are specified in
+  the 'compile_params_grid'.
   </dd>
 
   <dt>validation_table (optional)</dt>
@@ -215,29 +244,28 @@ madlib_keras_automl(
   is the name of the output
   table from running the image preprocessor on the validation dataset.
   Using a validation dataset can mean a
-  longer training time depending on its size, and the configurations would be evaluated on the basis of validation
-  loss instead of training loss.
-  This can be controlled using the 'metrics_compute_frequency'
-  parameter described below.</dd>
+  longer training time depending on its size, and the configurations in autoML will be
+  evaluated on the basis of validation
+  loss instead of training loss.</dd>
 
   <DT>metrics_compute_frequency (optional)</DT>
-  <DD>INTEGER, default: once at the end of training
-  after 'num_iterations'.  Frequency to compute per-iteration
+  <DD>INTEGER, default: once at the end of training.
+  Frequency to compute per-iteration
   metrics for the training dataset and validation dataset
   (if specified).  There can be considerable cost to
   computing metrics every iteration, especially if the
   training dataset is large.  This parameter is a way of
   controlling the frequency of those computations.
   For example, if you specify 5, then metrics will be computed
-  every 5 iterations as well as at the end of training
-  after 'num_iterations'.  If you use the default,
+  every 5 iterations as well as at the end of training.
+  If you use the default,
   metrics will be computed only
-  once after 'num_iterations' have completed.
+  once after training has completed.
   </DD>
 
   <DT>name (optional)</DT>
   <DD>TEXT, default: NULL.
-    Free text string to identify a name, if desired.
+    Free text string to provide a name, if desired.
   </DD>
 
   <DT>description (optional)</DT>
@@ -249,9 +277,9 @@ madlib_keras_automl(
 
 <b>Output tables</b>
 <br>
-
-    The model selection output table has exactly 1 row of the best model configuration based on the
-    training/validation loss and contains the following columns:
+    The model selection output table <model_selection_table> has only
+    one row containing the best model configuration from autoML, based on the
+    training/validation loss.  It contains the following columns:
     <table class="output">
       <tr>
         <th>mst_key</th>
@@ -275,8 +303,9 @@ madlib_keras_automl(
         </td>
       </tr>
     </table>
+
     A summary table named <model_selection_table>_summary is
-    also created, which contains the following column:
+    also created, which contains the following columns:
     <table class="output">
       <tr>
         <th>model_arch_table</th>
@@ -287,13 +316,14 @@ madlib_keras_automl(
       <tr>
         <th>object_table</th>
         <td>VARCHAR. Name of the object table containing the serialized
-        Python objects for custom loss functions and custom metrics.
+        Python objects for custom loss functions, custom metrics
+        and top k categorical accuracy.
         If there are none, this field will be blank.
         </td>
       </tr>
     </table>
 
-    The model output table produced by fit contains the following columns.
+    The model output table produced by autoML contains columns below.
     There is one row per model configuration generated:
     <table class="output">
       <tr>
@@ -312,13 +342,13 @@ madlib_keras_automl(
       </tr>
     </table>
 
-    An info table named \<model_output_table\>_info is also created, which has the following columns.
-    There is one row per model as per the rows in the 'model_selection_table':
+    An info table named \<model_output_table\>_info is also created, which has the columns below.
+    There is one row per model:
     <table class="output">
       <tr>
         <th>mst_key</th>
         <td>INTEGER. ID that defines a unique tuple for model architecture-compile parameters-fit parameters,
-        as defined in the 'model_selection_table'.</td>
+        for each model configuration generated.</td>
       </tr>
       <tr>
         <th>model_id</th>
@@ -417,7 +447,7 @@ madlib_keras_automl(
         <th>metrics_iters</th>
         <td>Array indicating the iterations for which
         metrics are calculated, as derived from the
-        parameters 'metrics_compute_frequency' and iterations decided by the automl algorithm.
+        parameters 'metrics_compute_frequency' and iterations decided by the autoML algorithm.
         For example, if 'num_iterations=5'
         and 'metrics_compute_frequency=2', then 'metrics_iters' value
         would be {2,4,5} indicating that metrics were computed
@@ -429,13 +459,14 @@ madlib_keras_automl(
     </tr>
     <tr>
         <th>s</th>
-        <td>Bracket number</td>
+        <td>Bracket number from Hyperband schedule.
+        This column is not present for Hyperopt.</td>
     </tr>
     <tr>
         <th>i</th>
-        <td>Latest evaluated round number</td>
+        <td>Latest evaluated round number from Hyperband schedule.
+        This column is not present for Hyperopt.</td>
     </tr>
-
     </table>
 
     A summary table named \<model_output_table\>_summary is also created, which has the following columns:
@@ -483,15 +514,15 @@ madlib_keras_automl(
     </tr>
     <tr>
         <th>automl_method</th>
-        <td>Name of the automl method</td>
+        <td>Name of the autoML method used.</td>
     </tr>
     <tr>
         <th>automl_params</th>
-        <td>AutoML param values</td>
+        <td>AutoML parameter values.</td>
     </tr>
     <tr>
         <th>random_state</th>
-        <td>Chosen random seed</td>
+        <td>Chosen random seed.</td>
     </tr>
     <tr>
         <th>metrics_compute_frequency</th>
@@ -539,8 +570,12 @@ madlib_keras_automl(
    </table>
 
 @anchor hyperband_schedule
-@par Hyperband Schedule
+@par Print Hyperband Schedule
 
+This utility prints out the
+schedule for a set of input parameters.  It does not run the Hyperband method, rather it
+just prints out the schedule so you can see what the brackets look like.
+Refer to [1] for information on Hyperband schedules.
 <pre class="syntax">
 hyperband_schedule(
     schedule_table,
@@ -557,20 +592,20 @@ hyperband_schedule(
   </dd>
 
   <dt>R</dt>
-  <dd>INTEGER. Maximum number of resources (iterations) that can be allocated
-  to a single configuration.
+  <dd>INTEGER. Maximum number of resources (i.e., iterations) to allocate to a single configuration
+  in a round of Hyperband.
   </dd>
 
   <dt>eta</dt>
-  <dd>INTEGER, default 3. Controls the proportion of configurations discarded in
-  each round of successive halving. For example, for eta=3 will keep the best 1/3
-  the configurations for the next round.
+  <dd>INTEGER. Controls the proportion of configurations discarded in each
+  round of successive halving. For example, for eta=3 will keep the best 1/3 the
+  configurations for the next round.
   </dd>
 
   <dt>skip_last</dt>
-  <dd>INTEGER, default 0. The number of last rounds to skip. For example, for skip_last=1 will skip the
-  last round (i.e., last entry in each bracket), which is standard randomized search and can
-  be expensive when run for the total R iterations.
+  <dd>INTEGER. The number of last rounds to skip. For example, 'skip_last=1'
+  will skip the last round (i.e., last entry in each bracket), which is standard random
+  search and can be expensive when run for the total R iterations.
   </dd>
 
 </dl>
@@ -581,22 +616,22 @@ hyperband_schedule(
     <table class="output">
       <tr>
         <th>s</th>
-        <td>INTEGER. Bracket number
+        <td>INTEGER. Bracket number.
         </td>
       </tr>
       <tr>
         <th>i</th>
-        <td>INTEGER. Round (depth) in bracket
+        <td>INTEGER. Round (depth) in bracket.
         </td>
       </tr>
       <tr>
         <th>n_i</th>
-        <td>INTEGER. Number of configurations in this round
+        <td>INTEGER. Number of configurations in this round.
         </td>
       </tr>
       <tr>
         <th>r_i</th>
-        <td>INTEGER. Resources (iterations) in this round
+        <td>INTEGER. Resources (iterations) in this round.
         </td>
       </tr>
     </table>
@@ -605,17 +640,700 @@ hyperband_schedule(
 
 @anchor example
 @par Examples
-TBD.
 
+@note
+Deep learning works best on very large datasets,
+but that is not convenient for a quick introduction
+to the syntax. So in this example we use an MLP on the well
+known iris data set from https://archive.ics.uci.edu/ml/datasets/iris.
+For more realistic examples with images please refer
+to the deep learning notebooks
+at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts.
+
+<h4>Setup</h4>
+
+-# Create an input data set.
+<pre class="example">
+DROP TABLE IF EXISTS iris_data;
+CREATE TABLE iris_data(
+    id serial,
+    attributes numeric[],
+    class_text varchar
+);
+INSERT INTO iris_data(id, attributes, class_text) VALUES
+(1,ARRAY[5.1,3.5,1.4,0.2],'Iris-setosa'),
+(2,ARRAY[4.9,3.0,1.4,0.2],'Iris-setosa'),
+(3,ARRAY[4.7,3.2,1.3,0.2],'Iris-setosa'),
+(4,ARRAY[4.6,3.1,1.5,0.2],'Iris-setosa'),
+(5,ARRAY[5.0,3.6,1.4,0.2],'Iris-setosa'),
+(6,ARRAY[5.4,3.9,1.7,0.4],'Iris-setosa'),
+(7,ARRAY[4.6,3.4,1.4,0.3],'Iris-setosa'),
+(8,ARRAY[5.0,3.4,1.5,0.2],'Iris-setosa'),
+(9,ARRAY[4.4,2.9,1.4,0.2],'Iris-setosa'),
+(10,ARRAY[4.9,3.1,1.5,0.1],'Iris-setosa'),
+(11,ARRAY[5.4,3.7,1.5,0.2],'Iris-setosa'),
+(12,ARRAY[4.8,3.4,1.6,0.2],'Iris-setosa'),
+(13,ARRAY[4.8,3.0,1.4,0.1],'Iris-setosa'),
+(14,ARRAY[4.3,3.0,1.1,0.1],'Iris-setosa'),
+(15,ARRAY[5.8,4.0,1.2,0.2],'Iris-setosa'),
+(16,ARRAY[5.7,4.4,1.5,0.4],'Iris-setosa'),
+(17,ARRAY[5.4,3.9,1.3,0.4],'Iris-setosa'),
+(18,ARRAY[5.1,3.5,1.4,0.3],'Iris-setosa'),
+(19,ARRAY[5.7,3.8,1.7,0.3],'Iris-setosa'),
+(20,ARRAY[5.1,3.8,1.5,0.3],'Iris-setosa'),
+(21,ARRAY[5.4,3.4,1.7,0.2],'Iris-setosa'),
+(22,ARRAY[5.1,3.7,1.5,0.4],'Iris-setosa'),
+(23,ARRAY[4.6,3.6,1.0,0.2],'Iris-setosa'),
+(24,ARRAY[5.1,3.3,1.7,0.5],'Iris-setosa'),
+(25,ARRAY[4.8,3.4,1.9,0.2],'Iris-setosa'),
+(26,ARRAY[5.0,3.0,1.6,0.2],'Iris-setosa'),
+(27,ARRAY[5.0,3.4,1.6,0.4],'Iris-setosa'),
+(28,ARRAY[5.2,3.5,1.5,0.2],'Iris-setosa'),
+(29,ARRAY[5.2,3.4,1.4,0.2],'Iris-setosa'),
+(30,ARRAY[4.7,3.2,1.6,0.2],'Iris-setosa'),
+(31,ARRAY[4.8,3.1,1.6,0.2],'Iris-setosa'),
+(32,ARRAY[5.4,3.4,1.5,0.4],'Iris-setosa'),
+(33,ARRAY[5.2,4.1,1.5,0.1],'Iris-setosa'),
+(34,ARRAY[5.5,4.2,1.4,0.2],'Iris-setosa'),
+(35,ARRAY[4.9,3.1,1.5,0.1],'Iris-setosa'),
+(36,ARRAY[5.0,3.2,1.2,0.2],'Iris-setosa'),
+(37,ARRAY[5.5,3.5,1.3,0.2],'Iris-setosa'),
+(38,ARRAY[4.9,3.1,1.5,0.1],'Iris-setosa'),
+(39,ARRAY[4.4,3.0,1.3,0.2],'Iris-setosa'),
+(40,ARRAY[5.1,3.4,1.5,0.2],'Iris-setosa'),
+(41,ARRAY[5.0,3.5,1.3,0.3],'Iris-setosa'),
+(42,ARRAY[4.5,2.3,1.3,0.3],'Iris-setosa'),
+(43,ARRAY[4.4,3.2,1.3,0.2],'Iris-setosa'),
+(44,ARRAY[5.0,3.5,1.6,0.6],'Iris-setosa'),
+(45,ARRAY[5.1,3.8,1.9,0.4],'Iris-setosa'),
+(46,ARRAY[4.8,3.0,1.4,0.3],'Iris-setosa'),
+(47,ARRAY[5.1,3.8,1.6,0.2],'Iris-setosa'),
+(48,ARRAY[4.6,3.2,1.4,0.2],'Iris-setosa'),
+(49,ARRAY[5.3,3.7,1.5,0.2],'Iris-setosa'),
+(50,ARRAY[5.0,3.3,1.4,0.2],'Iris-setosa'),
+(51,ARRAY[7.0,3.2,4.7,1.4],'Iris-versicolor'),
+(52,ARRAY[6.4,3.2,4.5,1.5],'Iris-versicolor'),
+(53,ARRAY[6.9,3.1,4.9,1.5],'Iris-versicolor'),
+(54,ARRAY[5.5,2.3,4.0,1.3],'Iris-versicolor'),
+(55,ARRAY[6.5,2.8,4.6,1.5],'Iris-versicolor'),
+(56,ARRAY[5.7,2.8,4.5,1.3],'Iris-versicolor'),
+(57,ARRAY[6.3,3.3,4.7,1.6],'Iris-versicolor'),
+(58,ARRAY[4.9,2.4,3.3,1.0],'Iris-versicolor'),
+(59,ARRAY[6.6,2.9,4.6,1.3],'Iris-versicolor'),
+(60,ARRAY[5.2,2.7,3.9,1.4],'Iris-versicolor'),
+(61,ARRAY[5.0,2.0,3.5,1.0],'Iris-versicolor'),
+(62,ARRAY[5.9,3.0,4.2,1.5],'Iris-versicolor'),
+(63,ARRAY[6.0,2.2,4.0,1.0],'Iris-versicolor'),
+(64,ARRAY[6.1,2.9,4.7,1.4],'Iris-versicolor'),
+(65,ARRAY[5.6,2.9,3.6,1.3],'Iris-versicolor'),
+(66,ARRAY[6.7,3.1,4.4,1.4],'Iris-versicolor'),
+(67,ARRAY[5.6,3.0,4.5,1.5],'Iris-versicolor'),
+(68,ARRAY[5.8,2.7,4.1,1.0],'Iris-versicolor'),
+(69,ARRAY[6.2,2.2,4.5,1.5],'Iris-versicolor'),
+(70,ARRAY[5.6,2.5,3.9,1.1],'Iris-versicolor'),
+(71,ARRAY[5.9,3.2,4.8,1.8],'Iris-versicolor'),
+(72,ARRAY[6.1,2.8,4.0,1.3],'Iris-versicolor'),
+(73,ARRAY[6.3,2.5,4.9,1.5],'Iris-versicolor'),
+(74,ARRAY[6.1,2.8,4.7,1.2],'Iris-versicolor'),
+(75,ARRAY[6.4,2.9,4.3,1.3],'Iris-versicolor'),
+(76,ARRAY[6.6,3.0,4.4,1.4],'Iris-versicolor'),
+(77,ARRAY[6.8,2.8,4.8,1.4],'Iris-versicolor'),
+(78,ARRAY[6.7,3.0,5.0,1.7],'Iris-versicolor'),
+(79,ARRAY[6.0,2.9,4.5,1.5],'Iris-versicolor'),
+(80,ARRAY[5.7,2.6,3.5,1.0],'Iris-versicolor'),
+(81,ARRAY[5.5,2.4,3.8,1.1],'Iris-versicolor'),
+(82,ARRAY[5.5,2.4,3.7,1.0],'Iris-versicolor'),
+(83,ARRAY[5.8,2.7,3.9,1.2],'Iris-versicolor'),
+(84,ARRAY[6.0,2.7,5.1,1.6],'Iris-versicolor'),
+(85,ARRAY[5.4,3.0,4.5,1.5],'Iris-versicolor'),
+(86,ARRAY[6.0,3.4,4.5,1.6],'Iris-versicolor'),
+(87,ARRAY[6.7,3.1,4.7,1.5],'Iris-versicolor'),
+(88,ARRAY[6.3,2.3,4.4,1.3],'Iris-versicolor'),
+(89,ARRAY[5.6,3.0,4.1,1.3],'Iris-versicolor'),
+(90,ARRAY[5.5,2.5,4.0,1.3],'Iris-versicolor'),
+(91,ARRAY[5.5,2.6,4.4,1.2],'Iris-versicolor'),
+(92,ARRAY[6.1,3.0,4.6,1.4],'Iris-versicolor'),
+(93,ARRAY[5.8,2.6,4.0,1.2],'Iris-versicolor'),
+(94,ARRAY[5.0,2.3,3.3,1.0],'Iris-versicolor'),
+(95,ARRAY[5.6,2.7,4.2,1.3],'Iris-versicolor'),
+(96,ARRAY[5.7,3.0,4.2,1.2],'Iris-versicolor'),
+(97,ARRAY[5.7,2.9,4.2,1.3],'Iris-versicolor'),
+(98,ARRAY[6.2,2.9,4.3,1.3],'Iris-versicolor'),
+(99,ARRAY[5.1,2.5,3.0,1.1],'Iris-versicolor'),
+(100,ARRAY[5.7,2.8,4.1,1.3],'Iris-versicolor'),
+(101,ARRAY[6.3,3.3,6.0,2.5],'Iris-virginica'),
+(102,ARRAY[5.8,2.7,5.1,1.9],'Iris-virginica'),
+(103,ARRAY[7.1,3.0,5.9,2.1],'Iris-virginica'),
+(104,ARRAY[6.3,2.9,5.6,1.8],'Iris-virginica'),
+(105,ARRAY[6.5,3.0,5.8,2.2],'Iris-virginica'),
+(106,ARRAY[7.6,3.0,6.6,2.1],'Iris-virginica'),
+(107,ARRAY[4.9,2.5,4.5,1.7],'Iris-virginica'),
+(108,ARRAY[7.3,2.9,6.3,1.8],'Iris-virginica'),
+(109,ARRAY[6.7,2.5,5.8,1.8],'Iris-virginica'),
+(110,ARRAY[7.2,3.6,6.1,2.5],'Iris-virginica'),
+(111,ARRAY[6.5,3.2,5.1,2.0],'Iris-virginica'),
+(112,ARRAY[6.4,2.7,5.3,1.9],'Iris-virginica'),
+(113,ARRAY[6.8,3.0,5.5,2.1],'Iris-virginica'),
+(114,ARRAY[5.7,2.5,5.0,2.0],'Iris-virginica'),
+(115,ARRAY[5.8,2.8,5.1,2.4],'Iris-virginica'),
+(116,ARRAY[6.4,3.2,5.3,2.3],'Iris-virginica'),
+(117,ARRAY[6.5,3.0,5.5,1.8],'Iris-virginica'),
+(118,ARRAY[7.7,3.8,6.7,2.2],'Iris-virginica'),
+(119,ARRAY[7.7,2.6,6.9,2.3],'Iris-virginica'),
+(120,ARRAY[6.0,2.2,5.0,1.5],'Iris-virginica'),
+(121,ARRAY[6.9,3.2,5.7,2.3],'Iris-virginica'),
+(122,ARRAY[5.6,2.8,4.9,2.0],'Iris-virginica'),
+(123,ARRAY[7.7,2.8,6.7,2.0],'Iris-virginica'),
+(124,ARRAY[6.3,2.7,4.9,1.8],'Iris-virginica'),
+(125,ARRAY[6.7,3.3,5.7,2.1],'Iris-virginica'),
+(126,ARRAY[7.2,3.2,6.0,1.8],'Iris-virginica'),
+(127,ARRAY[6.2,2.8,4.8,1.8],'Iris-virginica'),
+(128,ARRAY[6.1,3.0,4.9,1.8],'Iris-virginica'),
+(129,ARRAY[6.4,2.8,5.6,2.1],'Iris-virginica'),
+(130,ARRAY[7.2,3.0,5.8,1.6],'Iris-virginica'),
+(131,ARRAY[7.4,2.8,6.1,1.9],'Iris-virginica'),
+(132,ARRAY[7.9,3.8,6.4,2.0],'Iris-virginica'),
+(133,ARRAY[6.4,2.8,5.6,2.2],'Iris-virginica'),
+(134,ARRAY[6.3,2.8,5.1,1.5],'Iris-virginica'),
+(135,ARRAY[6.1,2.6,5.6,1.4],'Iris-virginica'),
+(136,ARRAY[7.7,3.0,6.1,2.3],'Iris-virginica'),
+(137,ARRAY[6.3,3.4,5.6,2.4],'Iris-virginica'),
+(138,ARRAY[6.4,3.1,5.5,1.8],'Iris-virginica'),
+(139,ARRAY[6.0,3.0,4.8,1.8],'Iris-virginica'),
+(140,ARRAY[6.9,3.1,5.4,2.1],'Iris-virginica'),
+(141,ARRAY[6.7,3.1,5.6,2.4],'Iris-virginica'),
+(142,ARRAY[6.9,3.1,5.1,2.3],'Iris-virginica'),
+(143,ARRAY[5.8,2.7,5.1,1.9],'Iris-virginica'),
+(144,ARRAY[6.8,3.2,5.9,2.3],'Iris-virginica'),
+(145,ARRAY[6.7,3.3,5.7,2.5],'Iris-virginica'),
+(146,ARRAY[6.7,3.0,5.2,2.3],'Iris-virginica'),
+(147,ARRAY[6.3,2.5,5.0,1.9],'Iris-virginica'),
+(148,ARRAY[6.5,3.0,5.2,2.0],'Iris-virginica'),
+(149,ARRAY[6.2,3.4,5.4,2.3],'Iris-virginica'),
+(150,ARRAY[5.9,3.0,5.1,1.8],'Iris-virginica');
+</pre>
+Create a test/validation dataset from the training data:
+<pre class="example">
+DROP TABLE IF EXISTS iris_train, iris_test;
+-- Set seed so results are reproducible
+SELECT setseed(0);
+SELECT madlib.train_test_split('iris_data',     -- Source table
+                               'iris',          -- Output table root name
+                                0.8,            -- Train proportion
+                                NULL,           -- Test proportion (0.2)
+                                NULL,           -- Strata definition
+                                NULL,           -- Output all columns
+                                NULL,           -- Sample without replacement
+                                TRUE            -- Separate output tables
+                              );
+SELECT COUNT(*) FROM iris_train;
+</pre>
+<pre class="result">
+ count
+------+
+   120
+</pre>
+
+-# Call the preprocessor for deep learning.  For the training dataset:
+<pre class="example">
+\\x on
+DROP TABLE IF EXISTS iris_train_packed, iris_train_packed_summary;
+SELECT madlib.training_preprocessor_dl('iris_train',         -- Source table
+                                       'iris_train_packed',  -- Output table
+                                       'class_text',         -- Dependent variable
+                                       'attributes'          -- Independent variable
+                                        );
+SELECT * FROM iris_train_packed_summary;
+</pre>
+<pre class="result">
+-[ RECORD 1 ]-------+---------------------------------------------
+source_table        | iris_train
+output_table        | iris_train_packed
+dependent_varname   | class_text
+independent_varname | attributes
+dependent_vartype   | character varying
+class_values        | {Iris-setosa,Iris-versicolor,Iris-virginica}
+buffer_size         | 60
+normalizing_const   | 1.0
+num_classes         | 3
+</pre>
+For the validation dataset:
+<pre class="example">
+DROP TABLE IF EXISTS iris_test_packed, iris_test_packed_summary;
+SELECT madlib.validation_preprocessor_dl('iris_test',          -- Source table
+                                         'iris_test_packed',   -- Output table
+                                         'class_text',         -- Dependent variable
+                                         'attributes',         -- Independent variable
+                                         'iris_train_packed'   -- From training preprocessor step
+                                          );
+SELECT * FROM iris_test_packed_summary;
+</pre>
+<pre class="result">
+-[ RECORD 1 ]-------+---------------------------------------------
+source_table        | iris_test
+output_table        | iris_test_packed
+dependent_varname   | class_text
+independent_varname | attributes
+dependent_vartype   | character varying
+class_values        | {Iris-setosa,Iris-versicolor,Iris-virginica}
+buffer_size         | 15
+normalizing_const   | 1.0
+num_classes         | 3
+</pre>
+
+-# Define and load model architecture.  Use Keras to define
+the model architecture with 1 hidden layer:
+<pre class="example">
+import keras
+from keras.models import Sequential
+from keras.layers import Dense
+model1 = Sequential()
+model1.add(Dense(10, activation='relu', input_shape=(4,)))
+model1.add(Dense(10, activation='relu'))
+model1.add(Dense(3, activation='softmax'))
+model1.summary()
+\verbatim
+
+_________________________________________________________________
+Layer (type)                 Output Shape              Param #
+=================================================================
+dense_1 (Dense)              (None, 10)                50
+_________________________________________________________________
+dense_2 (Dense)              (None, 10)                110
+_________________________________________________________________
+dense_3 (Dense)              (None, 3)                 33
+=================================================================
+Total params: 193
+Trainable params: 193
+Non-trainable params: 0
+\endverbatim
+</pre>
+Export the model to JSON:
+<pre class="example">
+model1.to_json()
+</pre>
+<pre class="result">
+'{"class_name": "Sequential", "keras_version": "2.1.6", "config": [{"class_name": "Dense", "config": {"kernel_initializer": {"class_name": "VarianceScaling", "config": {"distribution": "uniform", "scale": 1.0, "seed": null, "mode": "fan_avg"}}, "name": "dense_1", "kernel_constraint": null, "bias_regularizer": null, "bias_constraint": null, "dtype": "float32", "activation": "relu", "trainable": true, "kernel_regularizer": null, "bias_initializer": {"class_name": "Zeros", "config": {}}, "u [...]
+</pre>
+Define model architecture with 2 hidden layers:
+<pre class="example">
+model2 = Sequential()
+model2.add(Dense(10, activation='relu', input_shape=(4,)))
+model2.add(Dense(10, activation='relu'))
+model2.add(Dense(10, activation='relu'))
+model2.add(Dense(3, activation='softmax'))
+model2.summary()
+\verbatim
+
+Layer (type)                 Output Shape              Param #
+=================================================================
+dense_4 (Dense)              (None, 10)                50
+_________________________________________________________________
+dense_5 (Dense)              (None, 10)                110
+_________________________________________________________________
+dense_6 (Dense)              (None, 10)                110
+_________________________________________________________________
+dense_7 (Dense)              (None, 3)                 33
+=================================================================
+Total params: 303
+Trainable params: 303
+Non-trainable params: 0
+\endverbatim
+</pre>
+Export the model to JSON:
+<pre class="example">
+model2.to_json()
+</pre>
+<pre class="result">
+'{"class_name": "Sequential", "keras_version": "2.1.6", "config": [{"class_name": "Dense", "config": {"kernel_initializer": {"class_name": "VarianceScaling", "config": {"distribution": "uniform", "scale": 1.0, "seed": null, "mode": "fan_avg"}}, "name": "dense_4", "kernel_constraint": null, "bias_regularizer": null, "bias_constraint": null, "dtype": "float32", "activation": "relu", "trainable": true, "kernel_regularizer": null, "bias_initializer": {"class_name": "Zeros", "config": {}}, "u [...]
+</pre>
+Load into model architecture table:
+<pre class="example">
+DROP TABLE IF EXISTS model_arch_library;
+SELECT madlib.load_keras_model('model_arch_library',  -- Output table,
+$$
+{"class_name": "Sequential", "keras_version": "2.1.6", "config": [{"class_name": "Dense", "config": {"kernel_initializer": {"class_name": "VarianceScaling", "config": {"distribution": "uniform", "scale": 1.0, "seed": null, "mode": "fan_avg"}}, "name": "dense_1", "kernel_constraint": null, "bias_regularizer": null, "bias_constraint": null, "dtype": "float32", "activation": "relu", "trainable": true, "kernel_regularizer": null, "bias_initializer": {"class_name": "Zeros", "config": {}}, "un [...]
+$$
+::json,         -- JSON blob
+                               NULL,                  -- Weights
+                               'Sophie',              -- Name
+                               'MLP with 1 hidden layer'       -- Descr
+);
+SELECT madlib.load_keras_model('model_arch_library',  -- Output table,
+$$
+{"class_name": "Sequential", "keras_version": "2.1.6", "config": [{"class_name": "Dense", "config": {"kernel_initializer": {"class_name": "VarianceScaling", "config": {"distribution": "uniform", "scale": 1.0, "seed": null, "mode": "fan_avg"}}, "name": "dense_4", "kernel_constraint": null, "bias_regularizer": null, "bias_constraint": null, "dtype": "float32", "activation": "relu", "trainable": true, "kernel_regularizer": null, "bias_initializer": {"class_name": "Zeros", "config": {}}, "un [...]
+$$
+::json,         -- JSON blob
+                               NULL,                  -- Weights
+                               'Maria',               -- Name
+                               'MLP with 2 hidden layers'       -- Descr
+);
+</pre>
+
+<h4>Hyperband</h4>
+
+-# Print Hyperband schedule for example input parameters:
+<pre class="example">
+DROP TABLE IF EXISTS hb_schedule;
+SELECT madlib.hyperband_schedule ('hb_schedule', 
+                                   81,
+                                   3,
+                                   0);
+SELECT * FROM hb_schedule ORDER BY s DESC, i;
+</pre>
+<pre class="result">
+ s | i | n_i | r_i 
+---+---+-----+-----
+ 4 | 0 |  81 |   1
+ 4 | 1 |  27 |   3
+ 4 | 2 |   9 |   9
+ 4 | 3 |   3 |  27
+ 4 | 4 |   1 |  81
+ 3 | 0 |  27 |   3
+ 3 | 1 |   9 |   9
+ 3 | 2 |   3 |  27
+ 3 | 3 |   1 |  81
+ 2 | 0 |   9 |   9
+ 2 | 1 |   3 |  27
+ 2 | 2 |   1 |  81
+ 1 | 0 |   6 |  27
+ 1 | 1 |   2 |  81
+ 0 | 0 |   5 |  81
+(15 rows)
+</pre>
+-# Run Hyperband method:
+<pre class="example">
+DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table, automl_mst_table_summary;
+SELECT madlib.madlib_keras_automl('iris_train_packed',                -- source table
+                                  'automl_output',                    -- model output table
+                                  'model_arch_library',               -- model architecture table
+                                  'automl_mst_table',                 -- model selection output table
+                                  ARRAY[1,2],                         -- model IDs
+                                  $${
+                                      'loss': ['categorical_crossentropy'], 
+                                      'optimizer_params_list': [ 
+                                          {'optimizer': ['Adam'],'lr': [0.001, 0.1, 'log']},
+                                          {'optimizer': ['RMSprop'],'lr': [0.001, 0.1, 'log']}
+                                      ],
+                                      'metrics': ['accuracy']
+                                  } $$,                               -- compile param grid
+                                  $${'batch_size': [4, 8], 'epochs': [1]}$$,  -- fit params grid
+                                  'hyperband',                        -- autoML method
+                                  'R=9, eta=3, skip_last=0',          -- autoML params
+                                  NULL,                               -- random state
+                                  NULL,                               -- object table
+                                  FALSE,                              -- use GPUs
+                                  'iris_test_packed',                 -- validation table
+                                  1,                                  -- metrics compute freq
+                                  NULL,                               -- name
+                                  NULL);                              -- descr
+</pre>
+-# View the model summary:
+<pre class="example">
+SELECT * FROM automl_output_summary;
+</pre>
+<pre class="result">
+-[ RECORD 1 ]-------------+---------------------------------------------
+source_table              | iris_train_packed
+validation_table          | iris_test_packed
+model                     | automl_output
+model_info                | automl_output_info
+dependent_varname         | class_text
+independent_varname       | attributes
+model_arch_table          | model_arch_library
+model_selection_table     | automl_mst_table
+automl_method             | hyperband
+automl_params             | R=9, eta=3, skip_last=0
+random_state              | 
+object_table              | 
+use_gpus                  | f
+metrics_compute_frequency | 1
+name                      | 
+description               | 
+start_training_time       | 2020-10-23 00:20:52
+end_training_time         | 2020-10-23 00:22:19
+madlib_version            | 1.18.0-dev
+num_classes               | 3
+class_values              | {Iris-setosa,Iris-versicolor,Iris-virginica}
+dependent_vartype         | character varying
+normalizing_const         | 1
+</pre>
+-# View results for a few models:
+<pre class="example">
+SELECT * FROM automl_output_info ORDER BY validation_metrics_final DESC, validation_loss_final LIMIT 3;
+</pre>
+<pre class="result">
+-[ RECORD 1]----------------------------------------------------------------------------------------------------------
+mst_key                  | 13
+model_id                 | 1
+compile_params           | optimizer='Adam(lr=0.09102824394462919)',metrics=['accuracy'],loss='categorical_crossentropy'
+fit_params               | epochs=1,batch_size=4
+model_type               | madlib_keras
+model_size               | 0.7900390625
+metrics_elapsed_time     | {5.37390279769897,11.0799419879913,16.6234488487244,22.406044960022,28.1228229999542,33.9054269790649,39.4304218292236,45.18506193161
+01,50.8772490024567}
+metrics_type             | {accuracy}
+loss_type                | categorical_crossentropy
+training_metrics_final   | 0.966666638851166
+training_loss_final      | 0.117527179419994
+training_metrics         | {0.658333361148834,0.975000023841858,0.616666674613953,0.800000011920929,0.975000023841858,0.633333325386047,0.783333361148834,0.9666
+66638851166,0.966666638851166}
+training_loss            | {0.495927810668945,0.18766151368618,0.515772044658661,0.352419972419739,0.0683904364705086,0.749827742576599,0.778484106063843,0.1788
+07646036148,0.117527179419994}
+validation_metrics_final | 1
+validation_loss_final    | 0.0773465484380722
+validation_metrics       | {0.800000011920929,0.933333337306976,0.800000011920929,0.800000011920929,0.966666638851166,0.800000011920929,0.833333313465118,1,1}
+validation_loss          | {0.353509128093719,0.181772708892822,0.291709125041962,0.319768697023392,0.0782377645373344,0.42935499548912,0.600821077823639,0.1398
+00950884819,0.0773465484380722}
+metrics_iters            | {5,6,7,8,9,10,11,12,13}
+s                        | 0
+i                        | 0
+-[ RECORD 2]----------------------------------------------------------------------------------------------------------
+mst_key                  | 12
+model_id                 | 2
+compile_params           | optimizer='RMSprop(lr=0.0038913960350146193)',metrics=['accuracy'],loss='categorical_crossentropy'
+fit_params               | epochs=1,batch_size=4
+model_type               | madlib_keras
+model_size               | 1.2197265625
+metrics_elapsed_time     | {6.42797207832336,13.1542639732361,19.5103330612183,5.07128381729126,10.5596950054169,16.2247838973999,22.1064488887787,27.7473468780
+518,33.3738968372345,39.1290938854218,44.8826239109039,50.5644388198853}
+metrics_type             | {accuracy}
+loss_type                | categorical_crossentropy
+training_metrics_final   | 0.949999988079071
+training_loss_final      | 0.107505217194557
+training_metrics         | {0.666666686534882,0.949999988079071,0.958333313465118,0.949999988079071,0.774999976158142,0.941666662693024,0.941666662693024,0.9583
+33313465118,0.858333349227905,0.941666662693024,0.958333313465118,0.949999988079071}
+training_loss            | {0.807353079319,0.531827747821808,0.375816851854324,0.29875060915947,0.358659148216248,0.192024797201157,0.200978621840477,0.15286296
+6060638,0.272547781467438,0.125148341059685,0.11623315513134,0.107505217194557}
+validation_metrics_final | 0.966666638851166
+validation_loss_final    | 0.0540979467332363
+validation_metrics       | {0.699999988079071,0.899999976158142,0.966666638851166,0.966666638851166,0.899999976158142,1,0.899999976158142,0.966666638851166,0.89
+9999976158142,0.966666638851166,1,0.966666638851166}
+validation_loss          | {0.819778800010681,0.472518295049667,0.300146490335464,0.222854107618332,0.225204601883888,0.147142887115479,0.192571476101875,0.1301
+05406045914,0.178782090544701,0.0740523263812065,0.0714136436581612,0.0540979467332363}
+metrics_iters            | {2,3,4,5,6,7,8,9,10,11,12,13}
+s                        | 1
+i                        | 1
+-[ RECORD 3]----------------------------------------------------------------------------------------------------------
+mst_key                  | 6
+model_id                 | 1
+compile_params           | optimizer='Adam(lr=0.02358545238214664)',metrics=['accuracy'],loss='categorical_crossentropy'
+fit_params               | epochs=1,batch_size=8
+model_type               | madlib_keras
+model_size               | 0.7900390625
+metrics_elapsed_time     | {10.6679489612579,6.12874889373779,12.8602039813995,19.2172629833221}
+metrics_type             | {accuracy}
+loss_type                | categorical_crossentropy
+training_metrics_final   | 0.966666638851166
+training_loss_final      | 0.367920279502869
+training_metrics         | {0.658333361148834,0.658333361148834,0.891666650772095,0.966666638851166}
+training_loss            | {0.744447708129883,0.627880990505219,0.487682670354843,0.367920279502869}
+validation_metrics_final | 0.933333337306976
+validation_loss_final    | 0.280433148145676
+validation_metrics       | {0.699999988079071,0.699999988079071,0.933333337306976,0.933333337306976}
+validation_loss          | {0.5818150639534,0.465440601110458,0.350821226835251,0.280433148145676}
+metrics_iters            | {1,2,3,4}
+s                        | 2
+i                        | 1
+</pre>
+
+<h4>Hyperopt</h4>
+
+-# Run Hyperopt for a set number of trials, i.e., model configurations:
+<pre class="example">
+DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, automl_mst_table, automl_mst_table_summary;
+SELECT madlib.madlib_keras_automl('iris_train_packed',                -- source table
+                                  'automl_output',                    -- model output table
+                                  'model_arch_library',               -- model architecture table
+                                  'automl_mst_table',                 -- model selection output table
+                                  ARRAY[1,2],                         -- model IDs
+                                  $${
+                                      'loss': ['categorical_crossentropy'], 
+                                      'optimizer_params_list': [ 
+                                          {'optimizer': ['Adam'],'lr': [0.001, 0.1, 'log']},
+                                          {'optimizer': ['RMSprop'],'lr': [0.001, 0.1, 'log']}
+                                      ],
+                                      'metrics': ['accuracy']
+                                  } $$,                               -- compile param grid
+                                  $${'batch_size': [4, 8], 'epochs': [1]}$$,  -- fit params grid
+                                  'hyperopt',                         -- autoML method
+                                  'num_configs=20, num_iterations=10, algorithm=tpe',  -- autoML params
+                                  NULL,                               -- random state
+                                  NULL,                               -- object table
+                                  FALSE,                              -- use GPUs
+                                  'iris_test_packed',                 -- validation table
+                                  1,                                  -- metrics compute freq
+                                  NULL,                               -- name
+                                  NULL);                              -- descr
+</pre>
+-# View the model summary:
+<pre class="example">
+SELECT * FROM automl_output_summary;
+</pre>
+<pre class="result">
+-[ RECORD 1 ]-------------+-------------------------------------------------
+source_table              | iris_train_packed
+validation_table          | iris_test_packed
+model                     | automl_output
+model_info                | automl_output_info
+dependent_varname         | class_text
+independent_varname       | attributes
+model_arch_table          | model_arch_library
+model_selection_table     | automl_mst_table
+automl_method             | hyperopt
+automl_params             | num_configs=20, num_iterations=10, algorithm=tpe
+random_state              | 
+object_table              | 
+use_gpus                  | f
+metrics_compute_frequency | 1
+name                      | 
+description               | 
+start_training_time       | 2020-10-23 00:24:43
+end_training_time         | 2020-10-23 00:28:41
+madlib_version            | 1.18.0-dev
+num_classes               | 3
+class_values              | {Iris-setosa,Iris-versicolor,Iris-virginica}
+dependent_vartype         | character varying
+normalizing_const         | 1
+</pre>
+-# View results for a few models:
+<pre class="example">
+SELECT * FROM automl_output_info ORDER BY validation_metrics_final DESC, validation_loss_final LIMIT 3;
+</pre>
+<pre class="result">
+-[ RECORD 1]----------------------------------------------------------------------------------------------------------
+mst_key                  | 4
+model_id                 | 1
+compile_params           | optimizer='Adam(lr=0.021044174547856155)',metrics=['accuracy'],loss='categorical_crossentropy'
+fit_params               | epochs=1,batch_size=8
+model_type               | madlib_keras
+model_size               | 0.7900390625
+metrics_elapsed_time     | {24.9291331768036,27.1591901779175,29.3875880241394,31.4712460041046,33.6599950790405,35.9415881633759,38.0477111339569,40.2351109981537,42.3932039737701,44.4729251861572}
+metrics_type             | {accuracy}
+loss_type                | categorical_crossentropy
+training_metrics_final   | 0.958333313465118
+training_loss_final      | 0.116280987858772
+training_metrics         | {0.658333361148834,0.658333361148834,0.733333349227905,0.816666662693024,0.949999988079071,0.949999988079071,0.949999988079071,0.875,0.958333313465118,0.958333313465118}
+training_loss            | {0.681611657142639,0.50702965259552,0.41643014550209,0.349031865596771,0.2586330473423,0.234042942523956,0.204623967409134,0.337687611579895,0.116805233061314,0.116280987858772}
+validation_metrics_final | 1
+validation_loss_final    | 0.067971371114254
+validation_metrics       | {0.699999988079071,0.699999988079071,0.733333349227905,0.766666650772095,0.899999976158142,0.899999976158142,0.899999976158142,0.899999976158142,1,1}
+validation_loss          | {0.523795306682587,0.386897593736649,0.323715627193451,0.29447802901268,0.218715354800224,0.216124311089516,0.186037495732307,0.257792592048645,0.0693960413336754,0.067971371114254}
+metrics_iters            | {1,2,3,4,5,6,7,8,9,10}
+-[ RECORD 2]----------------------------------------------------------------------------------------------------------
+mst_key                  | 8
+model_id                 | 1
+compile_params           | optimizer='RMSprop(lr=0.055711748803920255)',metrics=['accuracy'],loss='categorical_crossentropy'
+fit_params               | epochs=1,batch_size=4
+model_type               | madlib_keras
+model_size               | 0.7900390625
+metrics_elapsed_time     | {68.9713232517242,71.1428651809692,73.0566282272339,75.2099182605743,77.4740402698517,79.4580070972443,81.5958452224731,83.6865520477295,85.6433861255646,87.8569240570068}
+metrics_type             | {accuracy}
+loss_type                | categorical_crossentropy
+training_metrics_final   | 0.966666638851166
+training_loss_final      | 0.106823824346066
+training_metrics         | {0.658333361148834,0.699999988079071,0.875,0.691666662693024,0.699999988079071,0.791666686534882,0.774999976158142,0.966666638851166,0.966666638851166,0.966666638851166}
+training_loss            | {0.681002557277679,0.431159198284149,0.418115794658661,0.51969450712204,0.605500161647797,0.36535832285881,0.451890885829926,0.126570284366608,0.116986438632011,0.106823824346066}
+validation_metrics_final | 1
+validation_loss_final    | 0.0758842155337334
+validation_metrics       | {0.699999988079071,0.699999988079071,0.966666638851166,0.699999988079071,0.699999988079071,0.800000011920929,0.766666650772095,0.966666638851166,0.966666638851166,1}
+validation_loss          | {0.693905889987946,0.364648938179016,0.287941485643387,0.509377717971802,0.622031152248383,0.377092003822327,0.488217085599899,0.10258474200964,0.0973251685500145,0.0758842155337334}
+metrics_iters            | {1,2,3,4,5,6,7,8,9,10}
+-[ RECORD 3]----------------------------------------------------------------------------------------------------------
+mst_key                  | 13
+model_id                 | 1
+compile_params           | optimizer='RMSprop(lr=0.006381376508189085)',metrics=['accuracy'],loss='categorical_crossentropy'
+fit_params               | epochs=1,batch_size=4
+model_type               | madlib_keras
+model_size               | 0.7900390625
+metrics_elapsed_time     | {141.029213190079,143.075024366379,145.330604314804,147.341159343719,149.579845190048,151.819869279861,153.939630270004,156.235336303711,158.536979198456,160.583434343338}
+metrics_type             | {accuracy}
+loss_type                | categorical_crossentropy
+training_metrics_final   | 0.975000023841858
+training_loss_final      | 0.0981351062655449
+training_metrics         | {0.875,0.933333337306976,0.875,0.975000023841858,0.975000023841858,0.908333361148834,0.949999988079071,0.966666638851166,0.975000023841858,0.975000023841858}
+training_loss            | {0.556384921073914,0.32896700501442,0.29009011387825,0.200998887419701,0.149432390928268,0.183790743350983,0.120595499873161,0.12202025949955,0.101290702819824,0.0981351062655449}
+validation_metrics_final | 1
+validation_loss_final    | 0.0775858238339424
+validation_metrics       | {0.899999976158142,0.966666638851166,0.766666650772095,1,1,0.933333337306976,0.966666638851166,0.966666638851166,1,1}
+validation_loss          | {0.442976772785187,0.249921068549156,0.268403559923172,0.167330235242844,0.134699374437332,0.140658855438232,0.0964709892868996,0.110730975866318,0.0810751244425774,0.0775858238339424}
+metrics_iters            | {1,2,3,4,5,6,7,8,9,10}
+</pre>
+-# Run inference on one of the models generated by Hyperopt.  In this example we use the
+validation set to predict on:
+<pre class="example">
+DROP TABLE IF EXISTS iris_predict;
+SELECT madlib.madlib_keras_predict('automl_output',    -- model
+                                   'iris_test',        -- test_table
+                                   'id',               -- id column
+                                   'attributes',       -- independent var
+                                   'iris_predict',     -- output table
+                                    'response',        -- prediction type
+                                    FALSE,             -- use gpus
+                                    4                 -- MST key
+                                   );
+SELECT * FROM iris_predict ORDER BY id;
+</pre>
+<pre class="result">
+ id  |   class_text    |    prob    
+-----+-----------------+------------
+   5 | Iris-setosa     |  0.9998704
+   7 | Iris-setosa     | 0.99953365
+  10 | Iris-setosa     |  0.9993413
+  16 | Iris-setosa     |  0.9999825
+  17 | Iris-setosa     |  0.9999256
+  21 | Iris-setosa     |  0.9995347
+  23 | Iris-setosa     |  0.9999405
+  27 | Iris-setosa     |  0.9989955
+  30 | Iris-setosa     |  0.9990559
+  31 | Iris-setosa     |  0.9986846
+  32 | Iris-setosa     |  0.9992879
+  37 | Iris-setosa     | 0.99987197
+  39 | Iris-setosa     |  0.9989151
+  46 | Iris-setosa     |  0.9981341
+  47 | Iris-setosa     |  0.9999044
+  53 | Iris-versicolor |  0.9745001
+  54 | Iris-versicolor |  0.8989025
+  56 | Iris-versicolor | 0.97066855
+  63 | Iris-versicolor | 0.96652734
+  71 | Iris-versicolor | 0.84569126
+  77 | Iris-versicolor |  0.9564522
+  83 | Iris-versicolor |  0.9664927
+  85 | Iris-versicolor | 0.96553373
+  93 | Iris-versicolor | 0.96748537
+ 103 | Iris-virginica  |  0.9343488
+ 108 | Iris-virginica  | 0.91668576
+ 117 | Iris-virginica  |  0.7323582
+ 124 | Iris-virginica  | 0.72906417
+ 132 | Iris-virginica  | 0.50430095
+ 144 | Iris-virginica  |  0.9487652
+(30 rows)
+</pre>
 
 @anchor notes
 @par Notes
-TBD.
 
+In practice you may need to do more than one run of an autoML method to arrive
+at a model with adequate accuracy.  One approach is to set the search space to
+be quite broad initially, then observe which hyperparameter ranges and model architectures
+seem to be doing the best.  Subesquent runs can then zoom in on those good ones
+in order to fine tune the model.
+
+@anchor literature
+@par Literature
+
+[1] Li <em>et al.</em>, "Hyperband: A Novel Bandit-Based Approach to
+Hyperparameter Optimization", Journal of Machine Learning Research 18 (2018) 1-52.
+
+[2] J. Bergstra, D. Yamins, D. D. Cox, "Making a Science of Model Search:
+Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures,"
+<em>Proceedings of the 30th International Conference on Machine Learning</em>, Atlanta, Georgia,
+USA, 2013. JMLR: W&CP volume 28.
 
 @anchor related
 @par Related Topics
-TBD.
+
+madlib_keras_automl.sql_in
 
 */