You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2021/03/09 22:42:28 UTC

[GitHub] [madlib] reductionista opened a new pull request #560: Strengthen security checks on compile params

reductionista opened a new pull request #560:
URL: https://github.com/apache/madlib/pull/560


   Here are a few improvements to how we process compile params, so that the code is more robust from a security perspective. 
   
      I haven't found a specific security exploit for these issues, but I wouldn't be surprised if someone with enough time and dedication could find a way to gain admin access via them... especially if there were a minor update to either MADlib's deep learning library or keras which doesn't take into account these potential issues.  (And at the very least, this fixes some odd error messages and crashes that can happen when strange parameters are passed.)
   
   Changes:
   
   1.  Exclude `deserialize`, `serialize`, `get`, and anything starting with `_` from the list of builtin losses and metrics we support.
   
   We were unintentionally allowing users to pass `keras.losses.deserialize`, `keras.losses._sys`, `keras.losses.get`, `keras.losses.__builtins__`, etc. as loss functions to keras, and the same for `keras.metrics.*` for metric functions.  Most of what's in `keras.losses.*` are loss functions or classes deriving from `keras.losses.Loss`, and most of what's in `keras.metrics` are metric functions or classes deriving from `keras.metrics.Metric`, but there are also these extra classes intended for other purposes (not intended for use directly as metrics or losses).
   
   The `deserialize()` function is particularly dangerous, in part because they accept exactly 2 arguments which means, even if you just pass something as simple as `loss=deserialize`, it will be called by keras itself during evaluation as `deserialize(y_true, y_pred)`.  And in part because the purpose of the `deserialize()` utility function is to take a string from the user and turn it into an instance of a callable class, which will then be called by keras.  In other words, it's a way of allowing users to pass in their own custom functions, but in a way such that MADlib only sees it as a `str` to pass along (rather than an arbitrary callable python object, which we do not allow ordinary users to pass through to keras).
   
   The danger is significantly mitigated by the fact that we don't allow any characters after the function name to be passed, even as a string.  For example, if the user were allowed to pass `loss=deserialize(..., ...)` then it would be fairly easy to gain admin access.  But we only allow `loss=deserialize` which makes it far more difficult if not impossible to exploit.  Nevertheless, this PR completely disables the use of such functions for losses or metrics, in case there is some way to trick keras into passing malicious arguments for y_true and y_pred.
   
   2.  Another potentially risky behavior was the input validator allowing the user to pass as a metric any string which contains the substring `top_k_categorical_accuracy`.  This provides a loophole that circumvents the intention to only allow metric names which are either a builtin keras metric or something in the custom functions object table.  For example, if the user passes `metrics=[system("echo allow all >> ${MASTER_DATA_DIRECTORY}/pg_hba.conf")] /* top_k_categorical_accuracy */, losses=...` for compile params, that will be considered a valid input by the input validator and passed along.  Once again, this doesn't appear to be an immediate security issue, as later we run a `literal_eval()` on it, which will error out.  But it's safest if we catch things like this up front during input validation.
   
   From what I can tell, explicitly whitelisting *top_k_categorical_acurracy* for a metric is unnecessary, since it is already a builtin keras function, and we don't allow any arguments to be passed to it anyway, it must be an exact match to a builtin or a custom function.
   
   3.  We were comparing the metrics parameter value substring `metrics[2:-2]` to the builtin and custom function names, which implicitly assumes that the first 2 characters are [' or [" and the last two are '] or "].
   
   Handling this more carefully using a literal eval to extract the string inside the first element of the list prevents sneaky inputs like `metrics=[*__builtins__ ]`.  (We already validate elsewhere that the full metrics parameter value is of type `list`.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] khannaekta commented on a change in pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
khannaekta commented on a change in pull request #560:
URL: https://github.com/apache/madlib/pull/560#discussion_r591731136



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -440,16 +444,16 @@ def get_custom_functions_list(compile_params):
 
     """
     compile_dict = convert_string_of_args_to_dict(compile_params)
-    builtin_losses = dir(losses)
+    builtin_losses = update_builtin_losses(dir(losses))
     builtin_metrics = update_builtin_metrics(dir(metrics))
 
     custom_fn_list = []
     local_loss = compile_dict['loss'].lower() if 'loss' in compile_dict else None
-    local_metric = compile_dict['metrics'].lower()[2:-2] if 'metrics' in compile_dict else None
+    metric_list = ast.literal_eval(compile_dict['metrics'].lower()) if 'metrics' in compile_dict else []

Review comment:
       Sounds good. Yes, I think we should add the similar try/catch for better error msg. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #560:
URL: https://github.com/apache/madlib/pull/560#discussion_r590934148



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -440,16 +444,16 @@ def get_custom_functions_list(compile_params):
 
     """
     compile_dict = convert_string_of_args_to_dict(compile_params)
-    builtin_losses = dir(losses)
+    builtin_losses = update_builtin_losses(dir(losses))
     builtin_metrics = update_builtin_metrics(dir(metrics))
 
     custom_fn_list = []
     local_loss = compile_dict['loss'].lower() if 'loss' in compile_dict else None
-    local_metric = compile_dict['metrics'].lower()[2:-2] if 'metrics' in compile_dict else None
+    metric_list = ast.literal_eval(compile_dict['metrics'].lower()) if 'metrics' in compile_dict else []

Review comment:
       I think it's better to catch the problem on master, than send bad input to the segments and let them find out.  If we just use the [2,-2] method then we may label some inputs as builtin functions when they aren't.  So whether it successfully literal eval's or not later, it may still get passed along even if it doesn't match any builtin functions or compile functions.
   
   I think it's best we interpret the string they pass in the same way each time we process it.  If it's handled one way here, and then another way later, then there are more possibilities for how the user can insert quotes and other special characters to confuse MADlib.
   
   However, looking at this again I notice there is a try/catch block around the literal_eval that happens on the segments.  If it errors out here, it won't give as nice of an error message... so maybe we should add a similar try/catch around it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] fmcquillan99 edited a comment on pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
fmcquillan99 edited a comment on pull request #560:
URL: https://github.com/apache/madlib/pull/560#issuecomment-796321033


   (1)
   ```
   # import database connector psycopg2 and create connection cursor
   import psycopg2 as p2
   conn = p2.connect('postgresql://gpadmin@localhost:8000/madlib')
   cur = conn.cursor()
   
   # import Dill and define functions
   import dill
   
   # custom loss
   def __squared_error(y_true, y_pred):
       import tensorflow.keras.backend as K 
       return K.square(y_pred - y_true)
   pb_squared_error=dill.dumps(squared_error)
   
   # custom metric
   def _rmse(y_true, y_pred):
       import tensorflow.keras.backend as K 
       return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
   pb_rmse=dill.dumps(rmse)
   
   # call load function
   cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'__squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'_rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
   conn.commit()
   ```
   ```
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              2);
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              3);
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              5);
   ```
   ```
   SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;
   ```
   ```
    id |      name       |      description       
   ----+-----------------+------------------------
     1 | __squared_error | squared error
     2 | _rmse           | root mean square error
     3 | top_2_accuracy  | returns top_2_accuracy
     4 | top_3_accuracy  | returns top_3_accuracy
     5 | top_5_accuracy  | returns top_5_accuracy
   (5 rows)
   ```
   ```
   DROP TABLE IF EXISTS iris_model, iris_model_summary;
   
   SELECT madlib.madlib_keras_fit('iris_train_packed',   -- source table
                                  'iris_model',          -- model output table
                                  'model_arch_library',  -- model arch table
                                   1,                    -- model arch id
                                   $$ loss='__squared_error', optimizer='adam', metrics=['_rmse'] $$,  -- compile_params
                                   $$ batch_size=5, epochs=3 $$,  -- fit_params
                                   10,                    -- num_iterations
                                   NULL,                  -- use_gpus,
                                   NULL,                  -- validation_table,
                                   NULL,                  -- metrics_compute_frequency,
                                   NULL,                  -- warm_start,
                                   NULL,                  -- name,
                                   NULL,                  -- description,
                                   'custom_function_table' -- object_table
                                 );
   ```
   works OK
   ```
   SELECT * FROM iris_model_summary;
   ```
   ```
   -[ RECORD 1 ]-------------+--------------------------------------------------------------
   source_table              | iris_train_packed
   model                     | iris_model
   dependent_varname         | {class_text}
   independent_varname       | {attributes}
   model_arch_table          | model_arch_library
   model_id                  | 1
   compile_params            |  loss='__squared_error', optimizer='adam', metrics=['_rmse'] 
   fit_params                |  batch_size=5, epochs=3 
   num_iterations            | 10
   validation_table          | 
   object_table              | madlib.custom_function_table
   metrics_compute_frequency | 10
   name                      | 
   description               | 
   model_type                | madlib_keras
   model_size                | 0.7900390625
   start_training_time       | 2021-03-11 00:35:24.12472
   end_training_time         | 2021-03-11 00:35:26.849639
   metrics_elapsed_time      | {2.724858045578}
   madlib_version            | 1.18.0-dev
   num_classes               | {3}
   dependent_vartype         | {"character varying"}
   normalizing_const         | 1
   metrics_type              | {_rmse}
   loss_type                 | __squared_error
   training_metrics_final    | 0.228817746043205
   training_loss_final       | 0.0709948241710663
   training_metrics          | {0.228817746043205}
   training_loss             | {0.0709948241710663}
   validation_metrics_final  | 
   validation_loss_final     | 
   validation_metrics        | 
   validation_loss           | 
   metrics_iters             | {10}
   class_text_class_values   | {Iris-setosa,Iris-versicolor,Iris-virginica}
   ```
   
   (2)
   same `madlib.custom_function_table` as above
   ```
   DROP TABLE IF EXISTS iris_model, iris_model_summary;
   
   SELECT madlib.madlib_keras_fit('iris_train_packed',   -- source table
                                  'iris_model',          -- model output table
                                  'model_arch_library',  -- model arch table
                                   1,                    -- model arch id
                                   $$ loss='categorical_crossentropy', optimizer='adam', metrics=['top_k_categorical_accuracy'] $$,  -- compile_params
                                   $$ batch_size=5, epochs=3 $$,  -- fit_params
                                   10,                    -- num_iterations
                                   NULL,                  -- use_gpus,
                                   NULL,                  -- validation_table,
                                   NULL,                  -- metrics_compute_frequency,
                                   NULL,                  -- warm_start,
                                   NULL,                  -- name,
                                   NULL,                  -- description,
                                   NULL                   -- object_table
                                 );
   ```
   works OK
   ```
   SELECT * FROM iris_model_summary;
   ```
   ```
   -[ RECORD 1 ]-------------+--------------------------------------------------------------------------------------------
   source_table              | iris_train_packed
   model                     | iris_model
   dependent_varname         | {class_text}
   independent_varname       | {attributes}
   model_arch_table          | model_arch_library
   model_id                  | 1
   compile_params            |  loss='categorical_crossentropy', optimizer='adam', metrics=['top_k_categorical_accuracy'] 
   fit_params                |  batch_size=5, epochs=3 
   num_iterations            | 10
   validation_table          | 
   object_table              | 
   metrics_compute_frequency | 10
   name                      | 
   description               | 
   model_type                | madlib_keras
   model_size                | 0.7900390625
   start_training_time       | 2021-03-11 00:32:13.816587
   end_training_time         | 2021-03-11 00:32:16.636334
   metrics_elapsed_time      | {2.81966114044189}
   madlib_version            | 1.18.0-dev
   num_classes               | {3}
   dependent_vartype         | {"character varying"}
   normalizing_const         | 1
   metrics_type              | {top_k_categorical_accuracy}
   loss_type                 | categorical_crossentropy
   training_metrics_final    | 1
   training_loss_final       | 0.347711235284805
   training_metrics          | {1}
   training_loss             | {0.347711235284805}
   validation_metrics_final  | 
   validation_loss_final     | 
   validation_metrics        | 
   validation_loss           | 
   metrics_iters             | {10}
   class_text_class_values   | {Iris-setosa,Iris-versicolor,Iris-virginica}
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #560:
URL: https://github.com/apache/madlib/pull/560#discussion_r590934148



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -440,16 +444,16 @@ def get_custom_functions_list(compile_params):
 
     """
     compile_dict = convert_string_of_args_to_dict(compile_params)
-    builtin_losses = dir(losses)
+    builtin_losses = update_builtin_losses(dir(losses))
     builtin_metrics = update_builtin_metrics(dir(metrics))
 
     custom_fn_list = []
     local_loss = compile_dict['loss'].lower() if 'loss' in compile_dict else None
-    local_metric = compile_dict['metrics'].lower()[2:-2] if 'metrics' in compile_dict else None
+    metric_list = ast.literal_eval(compile_dict['metrics'].lower()) if 'metrics' in compile_dict else []

Review comment:
       I think it's better to catch the problem on master, then send bad input to the segments and let them find out.  
   
   I see nothing wrong with calling literal eval twice--in fact, I think it's best we interpret the string they pass in the same way each time we process it.  If it's handled one way here, and then another way later, then there are more possibilities for how the user can insert quotes and other special characters to confuse MADlib.
   
   However, looking at this again I notice there is a try catch block around the literal_eval that happens on the segments.  If it errors out here, it won't give as nice a message... so maybe we should add a similar try/catch around it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #560:
URL: https://github.com/apache/madlib/pull/560#discussion_r590934148



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -440,16 +444,16 @@ def get_custom_functions_list(compile_params):
 
     """
     compile_dict = convert_string_of_args_to_dict(compile_params)
-    builtin_losses = dir(losses)
+    builtin_losses = update_builtin_losses(dir(losses))
     builtin_metrics = update_builtin_metrics(dir(metrics))
 
     custom_fn_list = []
     local_loss = compile_dict['loss'].lower() if 'loss' in compile_dict else None
-    local_metric = compile_dict['metrics'].lower()[2:-2] if 'metrics' in compile_dict else None
+    metric_list = ast.literal_eval(compile_dict['metrics'].lower()) if 'metrics' in compile_dict else []

Review comment:
       I think it's better to catch the problem on master, than send bad input to the segments and let them find out.  If we just use the [2,-2] method then we may label some inputs as builtin functions when they aren't.  So if it successfully literal evals later, it would get passed all the way to keras even if it doesn't match any builtin functions or compile functions.
   
   I think it's best we interpret the string they pass in the same way each time we process it.  If it's handled one way here, and then another way later, then there are more possibilities for how the user can insert quotes and other special characters to confuse MADlib.
   
   However, looking at this again I notice there is a try/catch block around the literal_eval that happens on the segments.  If it errors out here, it won't give as nice of an error message... so maybe we should add a similar try/catch around it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] khannaekta commented on a change in pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
khannaekta commented on a change in pull request #560:
URL: https://github.com/apache/madlib/pull/560#discussion_r590899329



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -440,16 +444,16 @@ def get_custom_functions_list(compile_params):
 
     """
     compile_dict = convert_string_of_args_to_dict(compile_params)
-    builtin_losses = dir(losses)
+    builtin_losses = update_builtin_losses(dir(losses))
     builtin_metrics = update_builtin_metrics(dir(metrics))
 
     custom_fn_list = []
     local_loss = compile_dict['loss'].lower() if 'loss' in compile_dict else None
-    local_metric = compile_dict['metrics'].lower()[2:-2] if 'metrics' in compile_dict else None
+    metric_list = ast.literal_eval(compile_dict['metrics'].lower()) if 'metrics' in compile_dict else []

Review comment:
       We call `ast.literal_eval()` as part of `validate_and_literal_eval_keys` which is called later in the flow, do we still need to add it here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] fmcquillan99 commented on pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
fmcquillan99 commented on pull request #560:
URL: https://github.com/apache/madlib/pull/560#issuecomment-796321033


   (1)
   ```
   # import database connector psycopg2 and create connection cursor
   import psycopg2 as p2
   conn = p2.connect('postgresql://gpadmin@localhost:8000/madlib')
   cur = conn.cursor()
   
   # import Dill and define functions
   import dill
   
   # custom loss
   def __squared_error(y_true, y_pred):
       import tensorflow.keras.backend as K 
       return K.square(y_pred - y_true)
   pb_squared_error=dill.dumps(squared_error)
   
   # custom metric
   def _rmse(y_true, y_pred):
       import tensorflow.keras.backend as K 
       return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
   pb_rmse=dill.dumps(rmse)
   
   # call load function
   cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'__squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'_rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
   conn.commit()
   ```
   ```
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              2);
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              3);
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              5);
   ```
   ```
   SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;
    id |      name       |      description       
   ----+-----------------+------------------------
     1 | __squared_error | squared error
     2 | _rmse           | root mean square error
     3 | top_2_accuracy  | returns top_2_accuracy
     4 | top_3_accuracy  | returns top_3_accuracy
     5 | top_5_accuracy  | returns top_5_accuracy
   (5 rows)
   
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] fmcquillan99 edited a comment on pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
fmcquillan99 edited a comment on pull request #560:
URL: https://github.com/apache/madlib/pull/560#issuecomment-796321033


   (1)
   ```
   # import database connector psycopg2 and create connection cursor
   import psycopg2 as p2
   conn = p2.connect('postgresql://gpadmin@localhost:8000/madlib')
   cur = conn.cursor()
   
   # import Dill and define functions
   import dill
   
   # custom loss
   def __squared_error(y_true, y_pred):
       import tensorflow.keras.backend as K 
       return K.square(y_pred - y_true)
   pb_squared_error=dill.dumps(squared_error)
   
   # custom metric
   def _rmse(y_true, y_pred):
       import tensorflow.keras.backend as K 
       return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
   pb_rmse=dill.dumps(rmse)
   
   # call load function
   cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'__squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'_rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
   conn.commit()
   ```
   ```
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              2);
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              3);
   SELECT madlib.load_top_k_accuracy_function('custom_function_table',
                                              5);
   ```
   ```
   SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;
   ```
   ```
    id |      name       |      description       
   ----+-----------------+------------------------
     1 | __squared_error | squared error
     2 | _rmse           | root mean square error
     3 | top_2_accuracy  | returns top_2_accuracy
     4 | top_3_accuracy  | returns top_3_accuracy
     5 | top_5_accuracy  | returns top_5_accuracy
   (5 rows)
   ```
   ```
   DROP TABLE IF EXISTS iris_model, iris_model_summary;
   
   SELECT madlib.madlib_keras_fit('iris_train_packed',   -- source table
                                  'iris_model',          -- model output table
                                  'model_arch_library',  -- model arch table
                                   1,                    -- model arch id
                                   $$ loss='__squared_error', optimizer='adam', metrics=['_rmse'] $$,  -- compile_params
                                   $$ batch_size=5, epochs=3 $$,  -- fit_params
                                   10,                    -- num_iterations
                                   NULL,                  -- use_gpus,
                                   NULL,                  -- validation_table,
                                   NULL,                  -- metrics_compute_frequency,
                                   NULL,                  -- warm_start,
                                   NULL,                  -- name,
                                   NULL,                  -- description,
                                   'custom_function_table' -- object_table
                                 );
   ```
   works OK
   ```
   SELECT * FROM iris_model_summary;
   ```
   ```
   -[ RECORD 1 ]-------------+--------------------------------------------------------------
   source_table              | iris_train_packed
   model                     | iris_model
   dependent_varname         | {class_text}
   independent_varname       | {attributes}
   model_arch_table          | model_arch_library
   model_id                  | 1
   compile_params            |  loss='__squared_error', optimizer='adam', metrics=['_rmse'] 
   fit_params                |  batch_size=5, epochs=3 
   num_iterations            | 10
   validation_table          | 
   object_table              | madlib.custom_function_table
   metrics_compute_frequency | 10
   name                      | 
   description               | 
   model_type                | madlib_keras
   model_size                | 0.7900390625
   start_training_time       | 2021-03-11 00:35:24.12472
   end_training_time         | 2021-03-11 00:35:26.849639
   metrics_elapsed_time      | {2.724858045578}
   madlib_version            | 1.18.0-dev
   num_classes               | {3}
   dependent_vartype         | {"character varying"}
   normalizing_const         | 1
   metrics_type              | {_rmse}
   loss_type                 | __squared_error
   training_metrics_final    | 0.228817746043205
   training_loss_final       | 0.0709948241710663
   training_metrics          | {0.228817746043205}
   training_loss             | {0.0709948241710663}
   validation_metrics_final  | 
   validation_loss_final     | 
   validation_metrics        | 
   validation_loss           | 
   metrics_iters             | {10}
   class_text_class_values   | {Iris-setosa,Iris-versicolor,Iris-virginica}
   ```
   
   (2)
   same `madlib.custom_function_table` as above
   ```
   DROP TABLE IF EXISTS iris_model, iris_model_summary;
   
   SELECT madlib.madlib_keras_fit('iris_train_packed',   -- source table
                                  'iris_model',          -- model output table
                                  'model_arch_library',  -- model arch table
                                   1,                    -- model arch id
                                   $$ loss='categorical_crossentropy', optimizer='adam', metrics=['top_k_categorical_accuracy'] $$,  -- compile_params
                                   $$ batch_size=5, epochs=3 $$,  -- fit_params
                                   10,                    -- num_iterations
                                   NULL,                  -- use_gpus,
                                   NULL,                  -- validation_table,
                                   NULL,                  -- metrics_compute_frequency,
                                   NULL,                  -- warm_start,
                                   NULL,                  -- name,
                                   NULL,                  -- description,
                                   NULL                   -- object_table
                                 );
   ```
   works OK
   ```
   SELECT * FROM iris_model_summary;
   ```
   ```
   -[ RECORD 1 ]-------------+--------------------------------------------------------------------------------------------
   source_table              | iris_train_packed
   model                     | iris_model
   dependent_varname         | {class_text}
   independent_varname       | {attributes}
   model_arch_table          | model_arch_library
   model_id                  | 1
   compile_params            |  loss='categorical_crossentropy', optimizer='adam', metrics=['top_k_categorical_accuracy'] 
   fit_params                |  batch_size=5, epochs=3 
   num_iterations            | 10
   validation_table          | 
   object_table              | 
   metrics_compute_frequency | 10
   name                      | 
   description               | 
   model_type                | madlib_keras
   model_size                | 0.7900390625
   start_training_time       | 2021-03-11 00:32:13.816587
   end_training_time         | 2021-03-11 00:32:16.636334
   metrics_elapsed_time      | {2.81966114044189}
   madlib_version            | 1.18.0-dev
   num_classes               | {3}
   dependent_vartype         | {"character varying"}
   normalizing_const         | 1
   metrics_type              | {top_k_categorical_accuracy}
   loss_type                 | categorical_crossentropy
   training_metrics_final    | 1
   training_loss_final       | 0.347711235284805
   training_metrics          | {1}
   training_loss             | {0.347711235284805}
   validation_metrics_final  | 
   validation_loss_final     | 
   validation_metrics        | 
   validation_loss           | 
   metrics_iters             | {10}
   class_text_class_values   | {Iris-setosa,Iris-versicolor,Iris-virginica}
   ```
   LGTM
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] khannaekta commented on a change in pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
khannaekta commented on a change in pull request #560:
URL: https://github.com/apache/madlib/pull/560#discussion_r591731136



##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_wrapper.py_in
##########
@@ -440,16 +444,16 @@ def get_custom_functions_list(compile_params):
 
     """
     compile_dict = convert_string_of_args_to_dict(compile_params)
-    builtin_losses = dir(losses)
+    builtin_losses = update_builtin_losses(dir(losses))
     builtin_metrics = update_builtin_metrics(dir(metrics))
 
     custom_fn_list = []
     local_loss = compile_dict['loss'].lower() if 'loss' in compile_dict else None
-    local_metric = compile_dict['metrics'].lower()[2:-2] if 'metrics' in compile_dict else None
+    metric_list = ast.literal_eval(compile_dict['metrics'].lower()) if 'metrics' in compile_dict else []

Review comment:
       It makes sense to catch it earlier. Yes, I think we should add the similar try/catch for better error msg. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista merged pull request #560: Strengthen security checks on compile params

Posted by GitBox <gi...@apache.org>.
reductionista merged pull request #560:
URL: https://github.com/apache/madlib/pull/560


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org