You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2019/10/18 17:46:00 UTC
[jira] [Commented] (MADLIB-1387) Make param search fit() function
work with existing evaluate and predict
[ https://issues.apache.org/jira/browse/MADLIB-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954857#comment-16954857 ]
Frank McQuillan commented on MADLIB-1387:
-----------------------------------------
**Error conditions to check**
1) If single model, then param `mst_key` must be NULL, else throw error with nice message. Because there is only 1 model created by `madlib_keras_fit()`
2) If multi model, then param `mst_key` must not be NULL, else throw error with nice message. Because there are multiple models created by `madlib_keras_fit_multiple_model()` and use must say which one they want to use. We don't want to guess or pick the 1st one or whatever.
**Acceptance**
1) Test error conditions above.
2) Generate E2E fit->eval->predict with single model and check it works (like before in 1.16).
3) Generate E2E fit->eval->predict with multi model and check it works (new workflow in 1.17).
> Make param search fit() function work with existing evaluate and predict
> ------------------------------------------------------------------------
>
> Key: MADLIB-1387
> URL: https://issues.apache.org/jira/browse/MADLIB-1387
> Project: Apache MADlib
> Issue Type: New Feature
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v1.17
>
>
> Follow on from
> https://issues.apache.org/jira/browse/MADLIB-1386
> Need an easy way for user to pick winner from param search to run evaluate and predict/inference.
> Proposed change in signatures:
> {code}
> madlib_keras_evaluate(
> model_table,
> test_table,
> output_table,
> gpus_per_host, -- this might change based on other story
> mst_key -- new optional param
> )
> {code}
> {code}
> madlib_keras_predict(
> model_table,
> test_table,
> id_col,
> independent_varname,
> output_table,
> pred_type,
> gpus_per_host, -- this might change based on other story
> mst_key -- new optional param
> )
> {code}
> Also should use `model_weights` in the summary table. Currently in single model is it `model_data` which is less descriptive.
> Table formats:
> {code}
> (A)
> madlib_keras_fit()
> http://madlib.apache.org/docs/latest/group__grp__keras.html
> produces these output files:
> 1) model table
> Column | Type | Modifiers
> ------------+-------+-----------
> model_data | bytea |
> model_arch | json |
> Distributed by: (model_data)
> 2) summary table
> Column | Type | Modifiers
> ---------------------------+-----------------------------+-----------
> source_table | text |
> model | text |
> dependent_varname | text |
> independent_varname | text |
> model_arch_table | text |
> model_arch_id | integer |
> compile_params | text |
> fit_params | text |
> num_iterations | integer |
> validation_table | text |
> metrics_compute_frequency | integer |
> name | text |
> description | text |
> model_type | text |
> model_size | double precision |
> start_training_time | timestamp without time zone |
> end_training_time | timestamp without time zone |
> metrics_elapsed_time | double precision[] |
> madlib_version | text |
> num_classes | integer |
> class_values | character varying[] |
> dependent_vartype | text |
> normalizing_const | real |
> metrics_type | text[] |
> training_metrics_final | double precision |
> training_loss_final | double precision |
> training_metrics | double precision[] |
> training_loss | double precision[] |
> validation_metrics_final | double precision |
> validation_loss_final | double precision |
> validation_metrics | double precision[] |
> validation_loss | double precision[] |
> metrics_iters | integer[] |
> Distributed by: (source_table)
> (B)
> madlib_keras_fit_multiple_model()
> produces these output files:
> 1) model table
> Column | Type | Modifiers
> ---------------+---------+-----------
> mst_key | integer | not null
> model_weights | bytea |
> model_arch | json |
> Indexes:
> "iris_multi_model_pkey" PRIMARY KEY, btree (mst_key)
> Distributed by: (mst_key)
> 2) summary table
> Column | Type | Modifiers
> ---------------------+-----------------------------+-----------
> source_table | text |
> validation_table | text |
> model | text |
> model_info | text |
> dependent_varname | text |
> independent_varname | text |
> model_arch_table | text |
> num_iterations | integer |
> start_training_time | timestamp without time zone |
> end_training_time | timestamp without time zone |
> madlib_version | text |
> num_classes | integer |
> class_values | text[] |
> dependent_vartype | text |
> normalizing_const | real |
> Distributed by: (source_table)
> 3) info table
> Column | Type | Modifiers
> --------------------------+--------------------+-----------
> mst_key | integer | not null
> model_id | integer |
> compile_params | text |
> fit_params | text |
> model_type | text |
> model_size | double precision |
> metrics_elapsed_time | double precision[] |
> metrics_type | text[] |
> training_metrics_final | double precision |
> training_loss_final | double precision |
> training_metrics | double precision[] |
> training_loss | double precision[] |
> validation_metrics_final | double precision |
> validation_loss_final | double precision |
> validation_metrics | double precision[] |
> validation_loss | double precision[] |
> Indexes:
> "iris_multi_model_info_pkey" PRIMARY KEY, btree (mst_key)
> Distributed by: (mst_key)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)