You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2020/11/16 20:42:37 UTC

[GitHub] [madlib] reductionista opened a new pull request #524: DL: Support Tensorboard

reductionista opened a new pull request #524:
URL: https://github.com/apache/madlib/pull/524


   We don't want to allow arbitrary custom callback functions, but we can make a special exception for TensorBoard.
   
   With this commit, users will be able to get realtime feedback on accuracy and loss while training, as well as visualize the tensor graph of the keras model they've submitted.
   
   Usage:
   1.  Add callback to TensorBoard to fit_params:
   ```
       callbacks=[TensorBoard(log_dir="/tmp/tensorflow/scalars")]
   ```
   2.  Start tensorboard before training or prediction with MADlib deep learning module:
   ```
       tensorboard --logdir='/tmp/tensorflow/scalars'
   
   To quickly enable it for an entire model selection table:
   ```
   update mst_table set fit_params = (fitparams || ', callbacks=[TensorBoard(log_dir="/tmp/tensorflow/scalars")]');
   ```
   
   We should look over it carefully to make sure this doesn't add significant security risk before merging.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583238518



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -170,11 +173,19 @@ def convert_string_of_args_to_dict(str_of_args):
         elif not stack and char == ",":
             value_str = result_str
             result_str = ""
-            compile_dict[key_str.strip()]=value_str.strip().strip('\'')
+            key_str = key_str.strip()
+            value_str = value_str.strip()
+            if strip_quotes:
+                value_str = value_str.strip('"\'')
+            compile_dict[key_str.strip()]=value_str

Review comment:
       Good catch, we can remove one of them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r587585018



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',
+                           'write_graph', 'write_grad', 'write_images' ]
+    tb_params_dict = validate_and_literal_eval_keys(tb_params_dict, accepted_tb_params, accepted_tb_params)
+    tb_params_dict['log_dir'] = tb_params_dict['log_dir']+"{0}".format(current_seg_id)

Review comment:
       I saw it happening on my local machine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785190051


   > I see, please make a general fix not just for the specific case of `log`
   
   Will do!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r574767403



##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
##########
@@ -329,7 +330,8 @@ DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, a
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$);
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$,
+    $${'batch_size': [2, 4], 'epochs': [3], 'callbacks': ['[TensorBoard(log_dir=\'/tmp/tensorflow/scalars/\')]']}$$);

Review comment:
       Alternatively, we could decide to treat callbacks specially, knowing that it's supposed to be a list.  Currently, if they pass something like this:
   ```
   'callbacks' :  [ 'TensorBoard(log_dir="/tmp/tensorflow/scalars/")' ]
   ```
   it will try to pass
   
   param_value = 'TensorBoard(log_dir="/tmp/tensorflow/scalars/")'
   
   to fit.  We could just add a `param_value = '[{}]'.format(param_value)` transformation before it gets evaluated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r574766356



##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
##########
@@ -329,7 +330,8 @@ DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, a
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$);
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$,
+    $${'batch_size': [2, 4], 'epochs': [3], 'callbacks': ['[TensorBoard(log_dir=\'/tmp/tensorflow/scalars/\')]']}$$);

Review comment:
       Good suggestions, but I suggest postponing any additional capability until after 1.18.0 release.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 edited a comment on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 edited a comment on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785321018


   LGTM
   
   I added a couple user doc comments about callbacks and TensorBoard support as part of 
   https://github.com/apache/madlib/pull/534
   
   @kaknikhil can you please review this PR?  thx


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586813934



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',

Review comment:
       "In recent versions of keras, the optimizer classes are required to be passed in the normal pythonic way, as class variables."
   
   Correction:  in recent versions, you can still pass optimizers in one of two ways, but I think the options are a bit more limited than in the past.  Currently, you can pass  either an optimizer object (instantiation of anything derived from `keras.optimizers.Optimizer`), or what you get when you call get_config['name'] on the default instantiation of one of the builtin optimizers, eg:
   
   `keras.optimizers.Adam().get_config()['name']`
   'Adam'
   
   But it looks like for all the ones I've checked, the config name and the class name do still match.  So I guess what I said above is only true for non-default instantiations of the class.
   Any of these are okay:
   ```
   optmizer=Adam(learning_rate=0.01)
   optimizer=Adam(learning_rate=0.01, name='My custom Adam')
   optimizer='Adam'
   ```
   but this is not:
   ```
   optimizer='Adam(learning_rate=0.01)'
   ```
   It looks like they have the same deal with losses and metrics, but for those the names of the classes don't necessarily match:
   ```
   keras.losses.CategoricalCrossentropy().get_config()['name']
   'categorical_crossentropy'
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583732505



##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_fit.sql_in
##########
@@ -40,7 +40,7 @@ SELECT madlib_keras_fit(
     'model_arch',
     1,
     $$ optimizer=SGD(lr=0.01, decay=1e-6, nesterov=True), loss='categorical_crossentropy', metrics=['mae']$$::text,
-    $$ batch_size=2, epochs=1, verbose=0 $$::text,
+    $$ batch_size=2, epochs=1, verbose=0, callbacks=[TensorBoard(log_dir='/tmp/tensorflow/single/')] $$::text,

Review comment:
       yes agree this is a tensorboard thing, and user needs to manage the directories themselves where the log files are written




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586728227



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',

Review comment:
       We don't, but it makes the code less error prone and more easily maintainable.
   
   I think the question you're getting at, which I was very confused about when I first looked at this code, is why were we handling it as a special case, rather than just running literal eval on it like we do for any other param?
   
   Previously we were adding quotes around the value if they weren't there already, and then removing them later if the result is equal to the strings 'True' or 'False', with this awkward workaround:
   ```
   if 'shuffle' in fit_params_dict:
               shuffle_value = fit_params_dict['shuffle']
               if shuffle_value == 'True' or shuffle_value == 'False':
                   fit_params_dict['shuffle'] = bool(shuffle_value)
   ```
   But there is no reason to accept 'True' or 'False' as a param, as that's not valid python code for passing a boolean anyway.  So instead of changing True to 'True' then back to True again, we just leave it as is.  If they pass in shuffle='True' or shuffle="True" that should generate an error, just as it does if you passed that as a fit param to keras outside of MADlib--it's not a valid parameter value.
   
   We do have to handle the callbacks fit param as a special case, since we need to block any callbacks besides Tensorboard (at least for now).
   
   So why were we doing a whole extra song and dance for shuffle as well?  The answer is not obvious, but I eventually found that it has to do with older versions of keras allowing an alternate syntax for passing optimizer classes to compile params.
   
   In recent versions of keras, the optimizer classes are required to be passed in the normal pythonic way, as class variables.  But older versions also allowed passing the name of the class as a string... which keras then converts into the actual class before using.
   
   How does that affect fit params?  It doesn't, except for the fact that we re-purposed a function that processes the compile params, which was designed in an overly-general way to allow the value of any parameter to either have quotes around it or not.
   
   If my understanding is correct, the only reason this was necessary, even for processing compile params, is to handle this special case of how optimizer classes can be passed in older versions of keras.   It's not necessary at all for the fit params, since none of them can be passed in the alternate way, in any version of keras.  We were allowing some extra weird possibilities (like passing shuffle='True' instead of shuffle=True) that keras doesn't normally allow, just so that we didn't have to handle fit params differently from compile params, or most compile params differently from optimizer params.
   
   Hopefully eventually, we will be able to stop adding quotes to any of the params passed in, and just use literal_eval on everything.  But for now, it seems like maintaining backwards compatibility for the optimizer params is a good idea... so we still add quotes for all compile params, just not fit params now.  This removes the need to handle shuffle as a special case.  As a next step perhaps, we can do the same to either most or all of the compile params, but I think it may require re-writing some tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785321018


   LGTM
   
   I added a couple user doc comments about callbacks and TensorBoard support as part of 
   https://github.com/apache/madlib/pull/534
   
   You can merge this PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r574754629



##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
##########
@@ -329,7 +330,8 @@ DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, a
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$);
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$,
+    $${'batch_size': [2, 4], 'epochs': [3], 'callbacks': ['[TensorBoard(log_dir=\'/tmp/tensorflow/scalars/\')]']}$$);

Review comment:
       The syntax for passing these AutoML params is getting pretty ugly.
   
   Part of the reason is the use of \' instead of ", which appears to be a workaround for this bug:
   https://issues.apache.org/jira/projects/MADLIB/issues/MADLIB-1455
   
   This might be a good time to address that one, I would guess it's pretty low effort.  With double quotes working, it would look more like typical python:
   
   `{ 'epochs' : [3], 'callbacks': [ '[TensorBoard(log_dir="/tmp/tensorflow/scalars/"]' ], 'batch_size' : [2, 4] }`
   
   Eventually, if we add support for arbitrary custom callbacks, we should assess whether there is a better way for the user to pass an array of different callback choices they want AutoML to combine with different models.
   
   For now, since we only support TensorBoard() and it's just used for monitoring/debugging... it's probably safe to assume they'll always want the same Tensorboard callback for every model... rather than multiple choices of callbacks.
   
   One thing that might improve it further though, is allowing the [] to be optional for compile/fit params which only have a single choice, like this:
   
   ` 'epochs' : 3`   means the same as =>  `'epochs' : [3]`
   and
   `'callbacks' : '[ TensorBoard(log_dir="/tmp/tensorflow/scalars/") ]'`
        == means the same as ==>  `'callbacks' :  [ '[ TensorBoard(log_dir="/tmp/tensorflow/scalars/") ]' ]`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] kaknikhil commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

kaknikhil commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r589797079



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -145,6 +145,7 @@ def set_model_weights(segment_model, serialized_weights):
 
 """
 Used to convert compile_params and fit_params to actual argument dictionaries
+If strip_quotes is True, each value in the dictionayr will be stripped of quotes

Review comment:
       typo in dictionayr

##########
File path: src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
##########
@@ -1259,6 +1259,26 @@ class MadlibKerasWrapperTestCase(unittest.TestCase):
             self.subject.parse_and_validate_compile_params(test_str)
         self.assertIn('invalid optimizer', str(error.exception))
 
+    def test_parse_callbacks_pass(self):
+        test_str = """'[TensorBoard(log_dir="/tmp/logs/fit")]'"""
+        import utilities
+        self.util = utilities
+        self.util.is_superuser = MagicMock(return_value=True)

Review comment:
       we can also add a test for when `is_superuser` returns false. But we can also do that in a future PR.

##########
File path: src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
##########
@@ -1259,6 +1259,26 @@ class MadlibKerasWrapperTestCase(unittest.TestCase):
             self.subject.parse_and_validate_compile_params(test_str)
         self.assertIn('invalid optimizer', str(error.exception))
 
+    def test_parse_callbacks_pass(self):
+        test_str = """'[TensorBoard(log_dir="/tmp/logs/fit")]'"""

Review comment:
       maybe also add a couple of other tensorboard params here other than log_dir. If it's not straightforward, then feel free to skip it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r589817942



##########
File path: src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
##########
@@ -1259,6 +1259,26 @@ class MadlibKerasWrapperTestCase(unittest.TestCase):
             self.subject.parse_and_validate_compile_params(test_str)
         self.assertIn('invalid optimizer', str(error.exception))
 
+    def test_parse_callbacks_pass(self):
+        test_str = """'[TensorBoard(log_dir="/tmp/logs/fit")]'"""

Review comment:
       Looks like you asked this in another comment already.  I agree with @orhankislal 's response:
   
   "I don't think we have to test for every keras parameter combination. The log_dir is tested because it is error-prone with arbitrary paths but a numeric value like batch_size either works or keras throws an appropriate error."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586688436



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',
+                           'write_graph', 'write_grad', 'write_images' ]
+    tb_params_dict = validate_and_literal_eval_keys(tb_params_dict, accepted_tb_params, accepted_tb_params)
+    tb_params_dict['log_dir'] = tb_params_dict['log_dir']+"{0}".format(current_seg_id)

Review comment:
       Have we seen this happen?  (segments on the same machine overwriting each other's files)  Or is it just a theoretical concern?  I thought I recalled it working, but maybe misremembering.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785163577


   > (1)
   > turn off verbose write to console
   > 
   This is unrelated to the PR. The fix is already included in a different PR.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r574767403



##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
##########
@@ -329,7 +330,8 @@ DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, a
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$);
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$,
+    $${'batch_size': [2, 4], 'epochs': [3], 'callbacks': ['[TensorBoard(log_dir=\'/tmp/tensorflow/scalars/\')]']}$$);

Review comment:
       Alternatively, we could decide to treat callbacks specially, knowing that it's supposed to be a list.  Currently, if they pass something like this:
   ```
   'callbacks' :  [ 'TensorBoard(log_dir="/tmp/tensorflow/scalars/")' ]
   ```
   it will try to pass
   
   param_value = 'TensorBoard(log_dir="/tmp/tensorflow/scalars/")'
   
   We could do a `param_value = '[{}]'.format(param_value)` before it gets evaluated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583236462



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)

Review comment:
       They're in a try/except block.  If any of them fail then this error gets raised:
   "Invalid callbacks fit param.  Currently, only TensorBoard callbacks are accepted."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785182965


   I see, please make a general fix not just for the specific case of `log`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] kaknikhil commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

kaknikhil commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583839858



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):

Review comment:
       I see makes sense. So when it is running on master, maybe we should use -1 instead of 0




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 edited a comment on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 edited a comment on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-784665248


   (2) error when generating MSTs with TensorBoard:
   ```
   madlib=# SELECT madlib.generate_model_configs(
   madlib(#                                         'model_arch_library', -- model architecture table
   madlib(#                                         'mst_table',          -- model selection table output
   madlib(#                                          ARRAY[1,2],          -- model ids from model architecture table
   madlib(#                                          $$
   madlib$#                                             {'loss': ['categorical_crossentropy'],
   madlib$#                                              'optimizer_params_list': [ {'optimizer': ['Adam', 'SGD'], 'lr': [0.001, 0.01]} ],
   madlib$#                                              'metrics': ['accuracy']}
   madlib$#                                          $$,                  -- compile_param_grid
   madlib(#                                          $$
   madlib$#                                          { 'batch_size': [64, 128],
   madlib$#                                            'epochs': [10],
   madlib$#                                            'callbacks': ['[TensorBoard(log_dir="/tmp/logs/fit")]']
   madlib$#                                          }
   madlib$#                                          $$,                  -- fit_param_grid
   madlib(#                                          'grid'               -- search_type
   madlib(#                                          );
   ERROR:  plpy.Error: DL: Cannot search from a distribution with grid search (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "generate_model_configs", line 21, in <module>
       mst_loader = madlib_keras_model_selection.MstSearch(**globals())
     PL/Python function "generate_model_configs", line 42, in wrapper
     PL/Python function "generate_model_configs", line 309, in __init__
     PL/Python function "generate_model_configs", line 364, in validate_inputs
     PL/Python function "generate_model_configs", line 105, in _assert
   PL/Python function "generate_model_configs"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583649192



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',
+                           'write_graph', 'write_grad', 'write_images' ]
+    tb_params_dict = validate_and_literal_eval_keys(tb_params_dict, accepted_tb_params, accepted_tb_params)
+    tb_params_dict['log_dir'] = tb_params_dict['log_dir']+"{0}".format(current_seg_id)

Review comment:
       The problem is segments on the same machine overwriting the file, using segment_id solves this issue. As we discussed, we are not concerned with combining partial callback logs at this point.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-730717369


   Does not work for autoML, but maybe that is OK, maybe we do not need to support for autoML
   ```
   SELECT madlib.madlib_keras_automl('cifar_10_train_data_packed_allseg', 
                                     'automl_cifar10_output', 
                                     'model_arch_library', 
                                     'automl_cifar10_mst_table',
                                     ARRAY[2], 
                                     $${'loss': ['categorical_crossentropy'], 
                                        'optimizer_params_list': [ 
                                            {'optimizer': ['Adam'],'lr': [0.0001, 0.01, 'log']},
                                            {'optimizer': ['RMSprop'],'lr': [0.0001, 0.01, 'log'], 'decay': [1.0e-7, 1.0e-5, 'log']},
                                            {'optimizer': ['SGD'],'lr': [0.0001, 0.01, 'log'], 'momentum': [0.9, 0.99,'log_near_one']}],
                                        'metrics':['accuracy']}$$, 
                                     $${'batch_size': [128], 'epochs': [1], callbacks=[TensorBoard(log_dir="/tmp/tensorflow/scalars")]}$$,
                                     'hyperband', 
                                     'R=9, eta=3, skip_last=0',
                                     NULL,                  -- random state
                                     NULL,                  -- object table
                                     FALSE,                 -- use GPUs
                                     'cifar_10_test_data_packed_allseg', -- validation table
                                     1,                     -- metrics compute freq
                                     NULL,                  -- name
                                     NULL);                 -- descr
   
   ERROR:  plpy.Error: Invalid syntax in 'fit_params_dict' (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_automl", line 23, in <module>
       schedule_loader = madlib_keras_automl_hyperband.AutoMLHyperband(**globals())
     PL/Python function "madlib_keras_automl", line 167, in __init__
     PL/Python function "madlib_keras_automl", line 216, in find_hyperband_config
     PL/Python function "madlib_keras_automl", line 42, in wrapper
     PL/Python function "madlib_keras_automl", line 307, in __init__
   PL/Python function "madlib_keras_automl"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r589810899



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',
+                           'write_graph', 'write_grad', 'write_images' ]
+    tb_params_dict = validate_and_literal_eval_keys(tb_params_dict, accepted_tb_params, accepted_tb_params)
+    tb_params_dict['log_dir'] = tb_params_dict['log_dir']+"{0}".format(current_seg_id)

Review comment:
       ok, sounds good then




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] kaknikhil commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

kaknikhil commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r578764993



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):

Review comment:
       Does it make sense to add unit tests for this function ?

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
##########
@@ -361,7 +361,9 @@ class MstSearch():
             _assert(self.num_configs is None and self.random_state is None,
                     "DL: 'num_configs' and 'random_state' must be NULL for grid search")
             for distribution_type in self.accepted_distributions:
-                _assert(distribution_type not in compile_params_grid and distribution_type not in fit_params_grid,
+                tmp_dist = "'{0}']".format(distribution_type)
+                _assert(tmp_dist not in compile_params_grid and

Review comment:
       It might also make sense to add/update a test for this change. We can add both a positive and a negative test case to make sure that it passes and fails as expected

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
##########
@@ -361,7 +361,9 @@ class MstSearch():
             _assert(self.num_configs is None and self.random_state is None,
                     "DL: 'num_configs' and 'random_state' must be NULL for grid search")
             for distribution_type in self.accepted_distributions:
-                _assert(distribution_type not in compile_params_grid and distribution_type not in fit_params_grid,
+                tmp_dist = "'{0}']".format(distribution_type)

Review comment:
       I think adding a small explanation of why we need to format the type this way will help. Also renaming the variable to be something that will reflect it's purpose might make it easier to read.

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -309,6 +311,8 @@ def parse_and_validate_fit_params(fit_param_str):
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
         if 'callbacks' in fit_params_dict:
+            if not is_superuser(current_user()):

Review comment:
       Can we add a test for this ?

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -144,7 +147,7 @@ def set_model_weights(segment_model, serialized_weights):
 Used to convert compile_params and fit_params to actual argument dictionaries
 """
 
-def convert_string_of_args_to_dict(str_of_args):
+def convert_string_of_args_to_dict(str_of_args, strip_quotes=True):

Review comment:
       consider adding a docstring for the strip_quotes argument

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)

Review comment:
       All these assertions will raise a generic python assertion error without any message. Is that intentional ?

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -170,11 +173,19 @@ def convert_string_of_args_to_dict(str_of_args):
         elif not stack and char == ",":
             value_str = result_str
             result_str = ""
-            compile_dict[key_str.strip()]=value_str.strip().strip('\'')
+            key_str = key_str.strip()
+            value_str = value_str.strip()
+            if strip_quotes:
+                value_str = value_str.strip('"\'')
+            compile_dict[key_str.strip()]=value_str

Review comment:
       do we need to call `.strip` on `key_str` again ?

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -326,7 +330,8 @@ def parse_and_validate_fit_params(fit_param_str):
             accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',
                                    'write_graph', 'write_grad', 'write_images' ]
             tb_params_dict = validate_and_literal_eval_keys(tb_params_dict, accepted_tb_params, accepted_tb_params)
-            fit_params_dict['callbacks'] = [TensorBoard(tb_params_dict)]
+            tb_params_dict['log_dir'] = tb_params_dict['log_dir']+"{0}".format(current_seg_id)

Review comment:
       ```suggestion
     tb_params_dict['log_dir'] = "{0}{1}".format(tb_params_dict['log_dir'],(current_seg_id))
   ```

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+

Review comment:
       do we have a dev check test for `shuffle` ? If not should we add it ?

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',

Review comment:
       just curious, why did we have to move 'shuffle' from accepted_fit_params to literal_eval_fit_params ?

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',

Review comment:
       In our dev check tests, we only test for the log_dir param. Should we add tests for some of the other params as well ? We can probably update the same test to accept these params

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',
+                           'write_graph', 'write_grad', 'write_images' ]
+    tb_params_dict = validate_and_literal_eval_keys(tb_params_dict, accepted_tb_params, accepted_tb_params)
+    tb_params_dict['log_dir'] = tb_params_dict['log_dir']+"{0}".format(current_seg_id)

Review comment:
       ```suggestion
       tb_params_dict['log_dir'] = "{0}{1}".format(tb_params_dict['log_dir'], current_seg_id)
   ```

##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_fit.sql_in
##########
@@ -40,7 +40,7 @@ SELECT madlib_keras_fit(
     'model_arch',
     1,
     $$ optimizer=SGD(lr=0.01, decay=1e-6, nesterov=True), loss='categorical_crossentropy', metrics=['mae']$$::text,
-    $$ batch_size=2, epochs=1, verbose=0 $$::text,
+    $$ batch_size=2, epochs=1, verbose=0, callbacks=[TensorBoard(log_dir='/tmp/tensorflow/single/')] $$::text,

Review comment:
       can we assert that the directory `/tmp/tensorflow/single/` is not empty or contains expected files ?

##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):

Review comment:
       Do we need to have a default value for `current_seg_id` for this function and `parse_callbacks` ? Won't it be better if the calling function passes the segment id every time ? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r589816588



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):

Review comment:
       I like -1 better as well... if someone is trying to debug a problem and they see segment_id=-1 printed out, that will be more helpful and less prone to confusion than if it shows up as segment_id=0




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583608832



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):

Review comment:
       Even if we mock the database calls (is_superuser, etc.), we still need the TensorBoard initializer. AFAIK, we haven't imported or mocked tf functions in our unit tests but we can definitely try.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-784616302


   (1) 
   turn off verbose write to console
   ```madlib=# SELECT madlib.madlib_keras_fit('iris_train_packed',   -- source table
   madlib(#                                'iris_model',          -- model output table
   madlib(#                                'model_arch_library',  -- model arch table
   madlib(#                                 1,                    -- model arch id
   madlib(#                                 $$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$,  -- compile_params
   madlib(#                                 $$ batch_size=5, epochs=3, callbacks=[TensorBoard(log_dir="/tmp/tensorflow/scalars")]$$,  -- fit_params
   madlib(#                                 10                    -- num_iterations
   madlib(#                               );
   INFO:  
   	Time for training in iteration 1: 2.63317084312 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 2: 0.106519937515 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 3: 0.105093002319 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 4: 0.102034807205 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 5: 0.100094079971 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 6: 0.102514982224 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 7: 0.103103876114 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 8: 0.101779222488 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 9: 0.099191904068 sec
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
           select (madlib.internal_keras_evaluate(
   DETAIL:  
                                               ARRAY[class_text],
                                               ARRAY[attributes],
                                               ARRAY[class_text_shape],
                                               ARRAY[attributes_shape],
                                               $MAD$
   {"class_name": "Sequential", "keras_version": "2.1.6", "config": [{"class_name": "Dense", "config": {"kernel_initializer": {"class_name": "VarianceScaling", "config": {"distribution": "uniform", "scale": 1.0, "seed": null, "mode": "fan_avg"}}, "name": "dense_1", "kernel_constraint": null, "bias_regularizer": null, "bias_constraint": null, "dtype": "float32", "activation": "relu", "trainable": true, "kernel_regularizer": null, "bias_initializer": {"class_name": "Zeros", "config": {}}, "units": 10, "batch_input_shape": [null, 4], "use_bias": true, "activity_regularizer": null}}, {"class_name": "Dense", "config": {"kernel_initializer": {"class_name": "VarianceScaling", "config": {"distribution": "uniform", "scale": 1.0, "seed": null, "mode": "fan_avg"}}, "name": "dense_2", "kernel_constraint": null, "bias_regularizer": null, "bias_constraint": null, "activation": "relu", "trainable": true, "kernel_regularizer": null, "bias_initializer": {"class_name": "Zeros", "config": {}}, "units":
  10, "use_bias": true, "activity_regularizer": null}}, {"class_name": "Dense", "config": {"kernel_initializer": {"class_name": "VarianceScaling", "config": {"distribution": "uniform", "scale": 1.0, "seed": null, "mode": "fan_avg"}}, "name": "dense_3", "kernel_constraint": null, "bias_regularizer": null, "bias_constraint": null, "activation": "softmax", "trainable": true, "kernel_regularizer": null, "bias_initializer": {"class_name": "Zeros", "config": {}}, "units": 3, "use_bias": true, "activity_regularizer": null}}], "backend": "tensorflow"}
   $MAD$,
                                               $1,
                                               $__madlib__$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $__madlib__$,
                                               __table__.__dist_key__,
                                               ARRAY[1, 0],
                                               __table__.gp_segment_id,
                                               ARRAY[2, 2],
                                               ARRAY[60, 60],
                                               ARRAY[0, 0],
                                               True,
                                               $2
                                               )) as loss_metric
           from iris_train_packed AS __table__
   CONTEXT:  PL/Python function "madlib_keras_fit"
   INFO:  
   	Time for training in iteration 10: 0.101819038391 sec
   DETAIL:  
   	Time for evaluating training dataset in iteration 10: 0.0550060272217 sec
   	Training set metric after iteration 10: 0.458333343267
   	Training set loss after iteration 10: 0.788652122021
   CONTEXT:  PL/Python function "madlib_keras_fit"
    madlib_keras_fit 
   ------------------
    
   (1 row)
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 edited a comment on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 edited a comment on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-784667171


   (3). autoML doesn't seem to work either
   ```
   madlib=# SELECT madlib.madlib_keras_automl('iris_train_packed',                -- source table
   madlib(#                                   'automl_output',                    -- model output table
   madlib(#                                   'model_arch_library',               -- model architecture table
   madlib(#                                   'automl_mst_table',                 -- model selection output table
   madlib(#                                   ARRAY[1,2],                         -- model IDs
   madlib(#                                   $${
   madlib$#                                       'loss': ['categorical_crossentropy'],
   madlib$#                                       'optimizer_params_list': [
   madlib$#                                           {'optimizer': ['Adam'],'lr': [0.001, 0.1, 'log']},
   madlib$#                                           {'optimizer': ['RMSprop'],'lr': [0.001, 0.1, 'log']}
   madlib$#                                       ],
   madlib$#                                       'metrics': ['accuracy']
   madlib$#                                   } $$,                               -- compile param grid
   madlib(#                                   $${'batch_size': [4, 8], 'epochs': [1], 'callbacks': ['[TensorBoard(log_dir="/tmp/logs/fit")]'}$$,  -- fit params grid
   madlib(#                                   'hyperband',                        -- autoML method
   madlib(#                                   'R=9, eta=3, skip_last=0',          -- autoML params
   madlib(#                                   NULL,                               -- random state
   madlib(#                                   NULL,                               -- object table
   madlib(#                                   FALSE,                              -- use GPUs
   madlib(#                                   'iris_test_packed',                 -- validation table
   madlib(#                                   1,                                  -- metrics compute freq
   madlib(#                                   NULL,                               -- name
   madlib(#                                   NULL);                              -- descr
   ERROR:  plpy.Error: Invalid syntax in 'fit_params_dict' (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_automl", line 23, in <module>
       schedule_loader = madlib_keras_automl_hyperband.AutoMLHyperband(**globals())
     PL/Python function "madlib_keras_automl", line 165, in __init__
     PL/Python function "madlib_keras_automl", line 214, in find_hyperband_config
     PL/Python function "madlib_keras_automl", line 42, in wrapper
     PL/Python function "madlib_keras_automl", line 308, in __init__
   PL/Python function "madlib_keras_automl"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583649649



##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_fit.sql_in
##########
@@ -40,7 +40,7 @@ SELECT madlib_keras_fit(
     'model_arch',
     1,
     $$ optimizer=SGD(lr=0.01, decay=1e-6, nesterov=True), loss='categorical_crossentropy', metrics=['mae']$$::text,
-    $$ batch_size=2, epochs=1, verbose=0 $$::text,
+    $$ batch_size=2, epochs=1, verbose=0, callbacks=[TensorBoard(log_dir='/tmp/tensorflow/single/')] $$::text,

Review comment:
       That is the job of TensorBoard, I don't think we should get involved.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583608832



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):

Review comment:
       Even if we mock the database calls (is_superuser, etc.), we still need the TensorBoard initializer. AFAIK, we haven't imported or mocked tf functions in our unit tests but we can definitely try.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] fmcquillan99 commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

fmcquillan99 commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-784665248






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785168625


   > ERROR:  plpy.Error: DL: Cannot search from a distribution with grid search (plpython.c:5038)
   
   This happens because our code checks if the string `log` appears anywhere in the fit or compile params during validation. The folder I tested with did not have `log` but your test does which caused the issue. I'll fix it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583245581



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',
+                           'write_graph', 'write_grad', 'write_images' ]
+    tb_params_dict = validate_and_literal_eval_keys(tb_params_dict, accepted_tb_params, accepted_tb_params)
+    tb_params_dict['log_dir'] = tb_params_dict['log_dir']+"{0}".format(current_seg_id)

Review comment:
       If we're going to split up the logs into separate directorys, shouldn't the directory names be based on model_id rather than current_seg_id?  At least for fit multiple, not sure what the best behavior would be for fit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583647271



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):

Review comment:
       We call this function in two contexts:
   1) During the fit transition function for parsing: We have the segment_id
   2) During validation: We don't have the segment_id since it is running on master. 
   
   We can pass a value instead of having a default but it will be the exact same constant in any case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586813934



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',

Review comment:
       "In recent versions of keras, the optimizer classes are required to be passed in the normal pythonic way, as class variables."
   
   Correction:  in recent versions, you can still pass optimizers in one of two ways, but I think the options are a bit more limited than in the past.  Currently, you can pass  either an optimizer object (instantiation of anything derived from `keras.optimizers.Optimizer`), or what you get when you call get_config()['name'] on the default instantiation of one of the builtin optimizers, eg:
   
   `keras.optimizers.Adam().get_config()['name']`
   'Adam'
   
   But it looks like for all the ones I've checked, the config name and the class name do still match.  So I guess what I said above is only true for non-default instantiations of the class.
   Any of these are okay:
   ```
   optmizer=Adam(learning_rate=0.01)
   optimizer=Adam(learning_rate=0.01, name='My custom Adam')
   optimizer='Adam'
   ```
   but this is not:
   ```
   optimizer='Adam(learning_rate=0.01)'
   ```
   It looks like they have the same deal with losses and metrics, but for those the names of the classes don't necessarily match:
   ```
   keras.losses.CategoricalCrossentropy().get_config()['name']
   'categorical_crossentropy'
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r583642048



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -309,6 +311,8 @@ def parse_and_validate_fit_params(fit_param_str):
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
         if 'callbacks' in fit_params_dict:
+            if not is_superuser(current_user()):

Review comment:
       Both of those are direct SQL queries, neither unit-test nor dev-check is equipped to deal with this, we'll need a different solution to test a non-admin user.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r585517706



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+
+        if 'callbacks' in fit_params_dict:
+            fit_params_dict['callbacks'] = parse_callbacks(fit_params_dict['callbacks'], current_seg_id)
 
         return fit_params_dict
     else:
         return {}
 
+# Parse the callback fit params and create the TensorBoard object in the dictionary
+def parse_callbacks(callbacks, current_seg_id=0):
+    callbacks = callbacks.strip("'")
+    if not is_superuser(current_user()):
+        plpy.error("Only a superuser may use callbacks.")
+    try:
+        tree = ast.parse(callbacks, mode='eval')
+        assert(type(tree.body) == ast.List)
+        assert(len(tree.body.elts) == 1)
+        assert(type(tree.body.elts[0]) == ast.Call)
+        assert(tree.body.elts[0].func.id == 'TensorBoard')
+        tb_params = tree.body.elts[0].keywords
+        tb_params_dict = { tb_params[i].arg : tb_params[i].value \
+                        for i in range(len(tb_params)) }
+    except:
+        plpy.error("Invalid callbacks fit param.  Currently, "
+                    "only TensorBoard callbacks are accepted.")
+
+    accepted_tb_params = [ 'log_dir', 'histogram_freq', 'batch_size', 'update_freq',

Review comment:
       I don't think we have to test for every keras parameter combination. The `log_dir` is tested because it is error-prone with arbitrary paths but a numeric value like batch_size either works or keras throws an appropriate error. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal commented on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal commented on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785238377


   > (3). autoML doesn't seem to work either
   > madlib(#                                   $${'batch_size': [4, 8], 'epochs': [1], 'callbacks': ['[TensorBoard(log_dir="/tmp/logs/fit")]'}$$,  -- fit params grid
   
   You are missing the closing ] of callbacks `'callbacks': ['[TensorBoard(log_dir="/tmp/logs/fit")]'` -> `'callbacks': ['[TensorBoard(log_dir="/tmp/logs/fit")]']`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r574754629



##########
File path: src/ports/postgres/modules/deep_learning/test/madlib_keras_automl.sql_in
##########
@@ -329,7 +330,8 @@ DROP TABLE IF EXISTS automl_output, automl_output_info, automl_output_summary, a
 SELECT madlib_keras_automl('iris_data_packed', 'automl_output', 'iris_model_arch', 'automl_mst_table',
     ARRAY[1,2], $${'loss': ['categorical_crossentropy'], 'optimizer_params_list': [ {'optimizer': ['Adagrad', 'Adam'],
     'lr': [0.9, 0.95, 'log'], 'epsilon': [0.3, 0.5, 'log_near_one']}, {'optimizer': ['Adam', 'SGD'],
-    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$, $${'batch_size': [2, 4], 'epochs': [3]}$$);
+    'lr': [0.6, 0.65, 'log']} ], 'metrics':['accuracy'] }$$,
+    $${'batch_size': [2, 4], 'epochs': [3], 'callbacks': ['[TensorBoard(log_dir=\'/tmp/tensorflow/scalars/\')]']}$$);

Review comment:
       The syntax for passing these AutoML params is getting pretty ugly.
   
   Part of the reason is the use of \\' instead of ", which appears to be a workaround for this bug:
   https://issues.apache.org/jira/projects/MADLIB/issues/MADLIB-1455
   
   This might be a good time to address that one, I would guess it's pretty low effort.  With double quotes working, it would look more like typical python:
   
   `{ 'epochs' : [3], 'callbacks': [ '[TensorBoard(log_dir="/tmp/tensorflow/scalars/"]' ], 'batch_size' : [2, 4] }`
   
   Eventually, if we add support for arbitrary custom callbacks, we should assess whether there is a better way for the user to pass an array of different callback choices they want AutoML to combine with different models.
   
   For now, since we only support TensorBoard() and it's just used for monitoring/debugging... it's probably safe to assume they'll always want the same Tensorboard callback for every model... rather than multiple choices of callbacks.
   
   One thing that might improve it further though, is allowing the [] to be optional for compile/fit params which only have a single choice, like this:
   
   ` 'epochs' : 3`   means the same as =>  `'epochs' : [3]`
   and
   `'callbacks' : '[ TensorBoard(log_dir="/tmp/tensorflow/scalars/") ]'`
        == means the same as ==>  `'callbacks' :  [ '[ TensorBoard(log_dir="/tmp/tensorflow/scalars/") ]' ]`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586732966



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',
                                    'class_weight','initial_epoch','steps_per_epoch']
-        accepted_fit_params = literal_eval_fit_params + ['shuffle']
+        accepted_fit_params = literal_eval_fit_params + ['callbacks']
 
         fit_params_dict = validate_and_literal_eval_keys(fit_params_dict,
                                                          literal_eval_fit_params,
                                                          accepted_fit_params)
-        if 'shuffle' in fit_params_dict:
-            shuffle_value = fit_params_dict['shuffle']
-            if shuffle_value == 'True' or shuffle_value == 'False':
-                fit_params_dict['shuffle'] = bool(shuffle_value)
+

Review comment:
       Yes, looks like we do have a couple tests for it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal edited a comment on pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal edited a comment on pull request #524:
URL: https://github.com/apache/madlib/pull/524#issuecomment-785163577


   > (1)
   > turn off verbose write to console
   > 
   This is unrelated to this PR. The fix is already included in a different PR.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] reductionista commented on a change in pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

reductionista commented on a change in pull request #524:
URL: https://github.com/apache/madlib/pull/524#discussion_r586728227



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -287,27 +298,50 @@ def parse_optimizer(compile_dict):
 
 
 # Parse the fit parameters into a dictionary.
-def parse_and_validate_fit_params(fit_param_str):
+def parse_and_validate_fit_params(fit_param_str, current_seg_id=0):
 
     if fit_param_str:
-        fit_params_dict = convert_string_of_args_to_dict(fit_param_str)
-
-        literal_eval_fit_params = ['batch_size','epochs','verbose',
+        fit_params_dict = convert_string_of_args_to_dict(fit_param_str, strip_quotes=False)
+        literal_eval_fit_params = ['batch_size','epochs','verbose', 'shuffle',

Review comment:
       We don't, but it makes the code less error prone and more easily maintainable.
   
   I think the question you're getting at, which I was very confused about when I first looked at this code, is why were we handling it as a special case, rather than just running literal eval on it like we do for any other param?
   
   Previously we were adding quotes around the value if they weren't there already, and then removing them later if the result is equal to the strings 'True' or 'False', with this awkward workaround:
   ```
   if 'shuffle' in fit_params_dict:
               shuffle_value = fit_params_dict['shuffle']
               if shuffle_value == 'True' or shuffle_value == 'False':
                   fit_params_dict['shuffle'] = bool(shuffle_value)
   ```
   But there is no reason to accept 'True' or 'False' as a param, as that's not valid python code anyway.  So instead of changing True to 'True' then back to True again, we just leave it as is.  If they pass in shuffle='True' or shuffle="True" that should generate an error, just as it does if you passed that as a fit param to keras outside of MADlib--it's not a valid parameter value.
   
   We do have to handle the callbacks fit param as a special case, since we need to block any callbacks besides Tensorboard (at least for now).
   
   So why were we doing a whole extra song and dance for shuffle as well?  The answer is not obvious, but I eventually found that it has to do with older versions of keras allowing an alternate syntax for passing optimizer classes to compile params.
   
   In recent versions of keras, the optimizer classes are required to be passed in the normal pythonic way, as class variables.  But older versions also allowed passing the name of the class as a string... which keras then converts into the actual class before using.
   
   How does that affect fit params?  It doesn't, except for the fact that we re-purposed a function that processes the compile params, which was designed in an overly-general way to allow the value of any parameter to either have quotes around it or not.
   
   If my understanding is correct, the only reason this was necessary, even for processing compile params, is to handle this special case of how optimizer classes can be passed in older versions of keras.   It's not necessary at all for the fit params, since none of them can be passed in the alternate way, in any version of keras.  We were allowing some extra weird possibilities (like passing shuffle='True' instead of shuffle=True) that keras doesn't normally allow, just so that we didn't have to handle fit params differently from compile params, or most compile params differently from optimizer params.
   
   Hopefully eventually, we will be able to stop adding quotes to any of the params passed in, and just use literal_eval on everything.  But for now, it seems like maintaining backwards compatibility for the optimizer params is a good idea... so we still add quotes for all compile params, just not fit params now.  This removes the need to handle shuffle as a special case.  As a next step perhaps, we can do the same to either most or all of the compile params, but I think it may require re-writing some tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [madlib] orhankislal closed pull request #524: DL: TensorBoard Support

Posted by GitBox <gi...@apache.org>.

orhankislal closed pull request #524:
URL: https://github.com/apache/madlib/pull/524


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org