You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2019/10/18 17:46:00 UTC
[jira] [Commented] (MADLIB-1387) Make param search fit() function work with existing evaluate and predict

    [ https://issues.apache.org/jira/browse/MADLIB-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954857#comment-16954857 ] 

Frank McQuillan commented on MADLIB-1387:
-----------------------------------------

**Error conditions to check**

1) If single model, then param `mst_key` must be NULL, else throw error with nice message.   Because there is only 1 model created by `madlib_keras_fit()`
2) If multi model, then param `mst_key` must not be NULL, else throw error with nice message.  Because there are multiple models created by `madlib_keras_fit_multiple_model()` and use must say which one they want to use.  We don't want to guess or pick the 1st one or whatever.

**Acceptance**

1) Test error conditions above.
2) Generate E2E fit->eval->predict with single model and check it works (like before in 1.16).
3) Generate E2E fit->eval->predict with multi model and check it works (new workflow in 1.17).


> Make param search fit() function work with existing evaluate and predict
> ------------------------------------------------------------------------
>
>                 Key: MADLIB-1387
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1387
>             Project: Apache MADlib
>          Issue Type: New Feature
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v1.17
>
>
> Follow on from 
> https://issues.apache.org/jira/browse/MADLIB-1386
> Need an easy way for user to pick winner from param search to run evaluate and predict/inference.
> Proposed change in signatures:
> {code}
> madlib_keras_evaluate(
>     model_table,
>     test_table,
>     output_table,
>     gpus_per_host,		-- this might change based on other story
>     mst_key       		-- new optional param
>     )
> {code}
> {code}
> madlib_keras_predict(
>     model_table,
>     test_table,
>     id_col,
>     independent_varname,
>     output_table,
>     pred_type,
>     gpus_per_host,       -- this might change based on other story
>     mst_key              -- new optional param
>     )
> {code}
> Also should use `model_weights` in the summary table.  Currently in single model is it `model_data` which is less descriptive.
> Table formats:
> {code}
> (A)
> madlib_keras_fit()
> http://madlib.apache.org/docs/latest/group__grp__keras.html
> produces these output files:
> 1) model table
>    Column   | Type  | Modifiers
> ------------+-------+-----------
>  model_data | bytea |
>  model_arch | json  |
> Distributed by: (model_data)
> 2) summary table
>           Column           |            Type             | Modifiers
> ---------------------------+-----------------------------+-----------
>  source_table              | text                        |
>  model                     | text                        |
>  dependent_varname         | text                        |
>  independent_varname       | text                        |
>  model_arch_table          | text                        |
>  model_arch_id             | integer                     |
>  compile_params            | text                        |
>  fit_params                | text                        |
>  num_iterations            | integer                     |
>  validation_table          | text                        |
>  metrics_compute_frequency | integer                     |
>  name                      | text                        |
>  description               | text                        |
>  model_type                | text                        |
>  model_size                | double precision            |
>  start_training_time       | timestamp without time zone |
>  end_training_time         | timestamp without time zone |
>  metrics_elapsed_time      | double precision[]          |
>  madlib_version            | text                        |
>  num_classes               | integer                     |
>  class_values              | character varying[]         |
>  dependent_vartype         | text                        |
>  normalizing_const         | real                        |
>  metrics_type              | text[]                      |
>  training_metrics_final    | double precision            |
>  training_loss_final       | double precision            |
>  training_metrics          | double precision[]          |
>  training_loss             | double precision[]          |
>  validation_metrics_final  | double precision            |
>  validation_loss_final     | double precision            |
>  validation_metrics        | double precision[]          |
>  validation_loss           | double precision[]          |
>  metrics_iters             | integer[]                   |
> Distributed by: (source_table)
> (B)
> madlib_keras_fit_multiple_model()
> produces these output files:
> 1) model table
>     Column     |  Type   | Modifiers
> ---------------+---------+-----------
>  mst_key       | integer | not null
>  model_weights | bytea   |
>  model_arch    | json    |
> Indexes:
>     "iris_multi_model_pkey" PRIMARY KEY, btree (mst_key)
> Distributed by: (mst_key)
> 2) summary table
>        Column        |            Type             | Modifiers
> ---------------------+-----------------------------+-----------
>  source_table        | text                        |
>  validation_table    | text                        |
>  model               | text                        |
>  model_info          | text                        |
>  dependent_varname   | text                        |
>  independent_varname | text                        |
>  model_arch_table    | text                        |
>  num_iterations      | integer                     |
>  start_training_time | timestamp without time zone |
>  end_training_time   | timestamp without time zone |
>  madlib_version      | text                        |
>  num_classes         | integer                     |
>  class_values        | text[]                      |
>  dependent_vartype   | text                        |
>  normalizing_const   | real                        |
> Distributed by: (source_table)
> 3) info table
>           Column          |        Type        | Modifiers
> --------------------------+--------------------+-----------
>  mst_key                  | integer            | not null
>  model_id                 | integer            |
>  compile_params           | text               |
>  fit_params               | text               |
>  model_type               | text               |
>  model_size               | double precision   |
>  metrics_elapsed_time     | double precision[] |
>  metrics_type             | text[]             |
>  training_metrics_final   | double precision   |
>  training_loss_final      | double precision   |
>  training_metrics         | double precision[] |
>  training_loss            | double precision[] |
>  validation_metrics_final | double precision   |
>  validation_loss_final    | double precision   |
>  validation_metrics       | double precision[] |
>  validation_loss          | double precision[] |
> Indexes:
>     "iris_multi_model_info_pkey" PRIMARY KEY, btree (mst_key)
> Distributed by: (mst_key)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)