You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/05/23 15:41:00 UTC
[jira] [Comment Edited] (MADLIB-1338) DL: Add support for reporting
various metrics in fit/evaluate
[ https://issues.apache.org/jira/browse/MADLIB-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846798#comment-16846798 ]
Frank McQuillan edited comment on MADLIB-1338 at 5/23/19 3:40 PM:
------------------------------------------------------------------
Some questions below on tests I ran:
(1)
What is the final list of metrics that are supported and those that are not supported?
(2)
fit with validation dataset, compute metrics at the end
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
INFO: Model architecture size: 1KB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Model state (serialized) size: 0MB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.57543182373 sec, Total was 0.805295944214 sec (seg0 slice1 10.128.0.41:40000 pid=19444)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.555011034012 sec, Total was 0.783451080322 sec (seg1 slice1 10.128.0.41:40001 pid=19443)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 1: 2.45191693306 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.672018051147 sec, Total was 0.838011026382 sec (seg0 slice1 10.128.0.41:40000 pid=19444)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.599525928497 sec, Total was 0.768236160278 sec (seg1 slice1 10.128.0.41:40001 pid=19443)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 2: 0.840256214142 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.519747018814 sec, Total was 0.705535173416 sec (seg0 slice1 10.128.0.41:40000 pid=19444)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.536818981171 sec, Total was 0.731632947922 sec (seg1 slice1 10.128.0.41:40001 pid=19443)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 3: 0.733813047409 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.205695867538 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set metric after iteration 3: 0.341666668653.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set loss after iteration 3: 1.03914785385.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.255511045456 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set metric after iteration 3: 0.40000000596.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set loss after iteration 3: 1.01280891895.
CONTEXT: PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'source_table' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model_summary AS
SELECT
$MAD$iris_train_packed$MAD$::TEXT AS source_table,
$MAD$iris_model$MAD$::TEXT AS model,
$MAD$class_text$MAD$::TEXT AS dependent_varname,
$MAD$attributes$MAD$::TEXT AS independent_varname,
$MAD$model_arch_library$MAD$::TEXT AS model_arch_table,
1::INTEGER AS model_arch_id,
$1 AS compile_params,
$2 AS fit_params,
3::INTEGER AS num_iterations,
$MAD$iris_test_packed$MAD$::TEXT AS validation_table,
3::INTEGER AS metrics_compute_frequency,
$3 AS name,
$4 AS description,
'madlib_keras'::TEXT AS model_type,
47::INTEGER AS model_size,
'2019-05-23 14:33:13.602734'::TIMESTAMP AS start_training_time,
'2019-05-23 14:33:18.299369'::TIMESTAMP AS end_training_time,
$5 AS time_iter,
'1.16-dev'::TEXT AS madlib_version,
3::INTEGER AS num_classes,
$6 AS class_values,
$MAD$character varying$MAD$::TEXT AS dependent_vartype,
1.0::DOUBLE PRECISION AS normalizing_const,
0.341666668653::DOUBLE PRECISION AS training_metrics_final,
1.03914785385::DOUBLE PRECISION AS training_loss_final,
ARRAY[0.34166666865348816]::DOUBLE PRECISION[] AS training_metrics,
ARRAY[1.0391478538513184]::DOUBLE PRECISION[] AS training_loss,
0.40000000596::DOUBLE PRECISION AS validation_metrics_final,
1.01280891895::DOUBLE PRECISION AS validation_loss_final,
ARRAY[0.4000000059604645]::DOUBLE PRECISION[] AS validation_metrics,
ARRAY[1.012808918952942]::DOUBLE PRECISION[] AS validation_loss,
ARRAY[3]::INTEGER[] AS metrics_iters
"
PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'model_data' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model AS SELECT
$1 as model_data,
$2 as model_arch"
PL/Python function "madlib_keras_fit"
madlib_keras_fit
------------------
(1 row)
Time: 5074.171 ms
{code}
This is pretty verbose output, now should we reduce that?
When we say
INFO: Time for iteration 3: 0.733813047409 sec
...
INFO: Time for evaluation in iteration 3: 0.205695867538 sec.
could we change the first one to
INFO: Time for training in iteration 3: 0.733813047409 sec
to make it clear?
(3)
fit with validation dataset, compute metrics at the end (con't)
{code}
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:33:13.602734
end_training_time | 2019-05-23 14:33:18.299369
time_iter | {"2019-05-23 14:33:17.83783"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.341666668653
training_loss_final | 1.03914785385
training_metrics | {0.341666668653488}
training_loss | {1.03914785385132}
validation_metrics_final | 0.40000000596
validation_loss_final | 1.01280891895
validation_metrics | {0.400000005960464}
validation_loss | {1.01280891895294}
metrics_iters | {3}
{code}
What does
time_iter | {"2019-05-23 14:33:17.83783"}
mean in this case? Why does it not match the end training time?
Also is there a reason why there are a diff number of significant digits on the final vs. the per-iteration array values? Just curious.
(4)
fit with validation dataset, compute metrics every iteration
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
1, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+----------------------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 1
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:43:36.871703
end_training_time | 2019-05-23 14:43:42.707435
time_iter | {"2019-05-23 14:43:39.43415","2019-05-23 14:43:40.708928","2019-05-23 14:43:42.154485"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.683333337307
training_loss_final | 0.824792087078
training_metrics | {0.625,0.675000011920929,0.683333337306976}
training_loss | {0.90629643201828,0.858503997325897,0.824792087078094}
validation_metrics_final | 0.600000023842
validation_loss_final | 0.925064861774
validation_metrics | {0.566666662693024,0.600000023841858,0.600000023841858}
validation_loss | {1.04330325126648,0.973386645317078,0.925064861774445}
metrics_iters | {1,2,3}
{code}
We ran 3 iterations but and there are 2 intervals:
time_iter | {"2019-05-23 14:43:39.43415","2019-05-23 14:43:40.708928","2019-05-23 14:43:42.154485"}
What are these intervals?
(5)
fit with validation dataset, compute metrics at different iteration
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
2, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 2
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:50:08.757182
end_training_time | 2019-05-23 14:50:12.02966
time_iter | {"2019-05-23 14:50:10.390342","2019-05-23 14:50:11.461154"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.324999988079
training_loss_final | 1.40785217285
training_metrics | {0.324999988079071,0.324999988079071}
training_loss | {1.4350677728653,1.40785217285156}
validation_metrics_final | 0.366666674614
validation_loss_final | 1.21352612972
validation_metrics | {0.366666674613953,0.366666674613953}
validation_loss | {1.22771060466766,1.2135261297226}
metrics_iters | {2,3}
{code}
2,3 looks OK but again not sure what the intervals are in time_iter
(6)
fit with no validation dataset, compute metrics at different iteration
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
NULL, -- validation_table
2, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table |
metrics_compute_frequency | 2
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:53:21.993865
end_training_time | 2019-05-23 14:53:26.740577
time_iter | {"2019-05-23 14:53:25.273434","2019-05-23 14:53:26.546298"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.683333337307
training_loss_final | 0.968693137169
training_metrics | {0.683333337306976,0.683333337306976}
training_loss | {0.988789916038513,0.968693137168884}
validation_metrics_final |
validation_loss_final |
validation_metrics |
validation_loss |
metrics_iters | {2,3}
{code}
Looks OK
(7)
a different metric
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['mae'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 15:22:00.135229
end_training_time | 2019-05-23 15:22:04.579288
time_iter | {"2019-05-23 15:22:04.159354"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.633333325386
training_loss_final | 1.1630614996
training_metrics | {0.633333325386047}
training_loss | {1.16306149959564}
validation_metrics_final | 0.733333349228
validation_loss_final | 0.82923823595
validation_metrics | {0.733333349227905}
validation_loss | {0.82923823595047}
metrics_iters | {3}
{code}
which is a diff value than accuracy in run (2), so OK.
(8)
multiple metrics
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['acc', 'mae'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
INFO: Model architecture size: 1KB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Model state (serialized) size: 0MB
CONTEXT: PL/Python function "madlib_keras_fit"
ERROR: plpy.SPIError: plpy.Error: Only at most one metric is supported. (plpython.c:5038) (seg0 slice1 10.128.0.41:40000 pid=24904) (plpython.c:5038)
DETAIL:
Traceback (most recent call last):
PL/Python function "fit_transition", line 6, in <module>
return madlib_keras.fit_transition(**globals())
PL/Python function "fit_transition", line 430, in fit_transition
PL/Python function "fit_transition", line 101, in compile_and_set_weights
PL/Python function "fit_transition", line 288, in compile_model
PL/Python function "fit_transition", line 176, in parse_and_validate_compile_params
PL/Python function "fit_transition", line 190, in _validate_metrics
PL/Python function "fit_transition", line 96, in _assert
PL/Python function "fit_transition"
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit", line 21, in <module>
madlib_keras.fit(**globals())
PL/Python function "madlib_keras_fit", line 198, in fit
PL/Python function "madlib_keras_fit"
{code}
OK
(9)
no metrics
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam' $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
INFO: Model architecture size: 1KB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Model state (serialized) size: 0MB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.539932012558 sec, Total was 0.772794008255 sec (seg0 slice1 10.128.0.41:40000 pid=25033)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.522484779358 sec, Total was 0.755910873413 sec (seg1 slice1 10.128.0.41:40001 pid=25034)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 1: 2.45501494408 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.633363008499 sec, Total was 0.791009902954 sec (seg0 slice1 10.128.0.41:40000 pid=25033)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.633543968201 sec, Total was 0.792724847794 sec (seg1 slice1 10.128.0.41:40001 pid=25034)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 2: 0.794742822647 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.50497508049 sec, Total was 0.656916856766 sec (seg0 slice1 10.128.0.41:40000 pid=25033)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.507378101349 sec, Total was 0.660608053207 sec (seg1 slice1 10.128.0.41:40001 pid=25034)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 3: 0.662581205368 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.18151307106 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set metric after iteration 3: 0.0.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set loss after iteration 3: 4.28131914139.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.180009841919 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set metric after iteration 3: 0.0.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set loss after iteration 3: 4.82081604004.
CONTEXT: PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'source_table' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model_summary AS
SELECT
$MAD$iris_train_packed$MAD$::TEXT AS source_table,
$MAD$iris_model$MAD$::TEXT AS model,
$MAD$class_text$MAD$::TEXT AS dependent_varname,
$MAD$attributes$MAD$::TEXT AS independent_varname,
$MAD$model_arch_library$MAD$::TEXT AS model_arch_table,
1::INTEGER AS model_arch_id,
$1 AS compile_params,
$2 AS fit_params,
3::INTEGER AS num_iterations,
$MAD$iris_test_packed$MAD$::TEXT AS validation_table,
3::INTEGER AS metrics_compute_frequency,
$3 AS name,
$4 AS description,
'madlib_keras'::TEXT AS model_type,
47::INTEGER AS model_size,
'2019-05-23 15:06:36.817638'::TIMESTAMP AS start_training_time,
'2019-05-23 15:06:41.302073'::TIMESTAMP AS end_training_time,
$5 AS time_iter,
'1.16-dev'::TEXT AS madlib_version,
3::INTEGER AS num_classes,
$6 AS class_values,
$MAD$character varying$MAD$::TEXT AS dependent_vartype,
1.0::DOUBLE PRECISION AS normalizing_const,
0.0::DOUBLE PRECISION AS training_metrics_final,
4.28131914139::DOUBLE PRECISION AS training_loss_final,
ARRAY[0.0]::DOUBLE PRECISION[] AS training_metrics,
ARRAY[4.2813191413879395]::DOUBLE PRECISION[] AS training_loss,
0.0::DOUBLE PRECISION AS validation_metrics_final,
4.82081604004::DOUBLE PRECISION AS validation_loss_final,
ARRAY[0.0]::DOUBLE PRECISION[] AS validation_metrics,
ARRAY[4.8208160400390625]::DOUBLE PRECISION[] AS validation_loss,
ARRAY[3]::INTEGER[] AS metrics_iters
"
PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'model_data' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model AS SELECT
$1 as model_data,
$2 as model_arch"
PL/Python function "madlib_keras_fit"
-[ RECORD 1 ]----+-
madlib_keras_fit |
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+----------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam'
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 15:24:52.338952
end_training_time | 2019-05-23 15:24:57.002596
time_iter | {"2019-05-23 15:24:56.64148"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0
training_loss_final | 2.9513938427
training_metrics | {0}
training_loss | {2.95139384269714}
validation_metrics_final | 0
validation_loss_final | 2.5171585083
validation_metrics | {0}
validation_loss | {2.51715850830078}
metrics_iters | {3}
{code}
I think we should leave it blank not 0 if there is no metric.
(10)
no fit param
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
NULL, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
Time: 4980.306 ms
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params |
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 15:28:46.358564
end_training_time | 2019-05-23 15:28:50.987745
time_iter | {"2019-05-23 15:28:50.584773"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0
training_loss_final | 1.41545331478
training_metrics | {0}
training_loss | {1.41545331478119}
validation_metrics_final | 0
validation_loss_final | 1.2854578495
validation_metrics | {0}
validation_loss | {1.28545784950256}
metrics_iters | {3}
{code}
OK
(11)
no compile params
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
NULL, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
ERROR: TypeError: cannot concatenate 'str' and 'NoneType' objects (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit", line 21, in <module>
madlib_keras.fit(**globals())
PL/Python function "madlib_keras_fit", line 166, in fit
PL/Python function "madlib_keras_fit"
{code}
Does not look like the right error to throw.
was (Author: fmcquillan):
Some questions below on tests I ran:
(1)
What is the final list of metrics that are supported and those that are not supported?
(2)
fit with validation dataset, compute metrics at the end
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
INFO: Model architecture size: 1KB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Model state (serialized) size: 0MB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.57543182373 sec, Total was 0.805295944214 sec (seg0 slice1 10.128.0.41:40000 pid=19444)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.555011034012 sec, Total was 0.783451080322 sec (seg1 slice1 10.128.0.41:40001 pid=19443)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 1: 2.45191693306 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.672018051147 sec, Total was 0.838011026382 sec (seg0 slice1 10.128.0.41:40000 pid=19444)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.599525928497 sec, Total was 0.768236160278 sec (seg1 slice1 10.128.0.41:40001 pid=19443)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 2: 0.840256214142 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.519747018814 sec, Total was 0.705535173416 sec (seg0 slice1 10.128.0.41:40000 pid=19444)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.536818981171 sec, Total was 0.731632947922 sec (seg1 slice1 10.128.0.41:40001 pid=19443)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 3: 0.733813047409 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.205695867538 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set metric after iteration 3: 0.341666668653.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set loss after iteration 3: 1.03914785385.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.255511045456 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set metric after iteration 3: 0.40000000596.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set loss after iteration 3: 1.01280891895.
CONTEXT: PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'source_table' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model_summary AS
SELECT
$MAD$iris_train_packed$MAD$::TEXT AS source_table,
$MAD$iris_model$MAD$::TEXT AS model,
$MAD$class_text$MAD$::TEXT AS dependent_varname,
$MAD$attributes$MAD$::TEXT AS independent_varname,
$MAD$model_arch_library$MAD$::TEXT AS model_arch_table,
1::INTEGER AS model_arch_id,
$1 AS compile_params,
$2 AS fit_params,
3::INTEGER AS num_iterations,
$MAD$iris_test_packed$MAD$::TEXT AS validation_table,
3::INTEGER AS metrics_compute_frequency,
$3 AS name,
$4 AS description,
'madlib_keras'::TEXT AS model_type,
47::INTEGER AS model_size,
'2019-05-23 14:33:13.602734'::TIMESTAMP AS start_training_time,
'2019-05-23 14:33:18.299369'::TIMESTAMP AS end_training_time,
$5 AS time_iter,
'1.16-dev'::TEXT AS madlib_version,
3::INTEGER AS num_classes,
$6 AS class_values,
$MAD$character varying$MAD$::TEXT AS dependent_vartype,
1.0::DOUBLE PRECISION AS normalizing_const,
0.341666668653::DOUBLE PRECISION AS training_metrics_final,
1.03914785385::DOUBLE PRECISION AS training_loss_final,
ARRAY[0.34166666865348816]::DOUBLE PRECISION[] AS training_metrics,
ARRAY[1.0391478538513184]::DOUBLE PRECISION[] AS training_loss,
0.40000000596::DOUBLE PRECISION AS validation_metrics_final,
1.01280891895::DOUBLE PRECISION AS validation_loss_final,
ARRAY[0.4000000059604645]::DOUBLE PRECISION[] AS validation_metrics,
ARRAY[1.012808918952942]::DOUBLE PRECISION[] AS validation_loss,
ARRAY[3]::INTEGER[] AS metrics_iters
"
PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'model_data' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model AS SELECT
$1 as model_data,
$2 as model_arch"
PL/Python function "madlib_keras_fit"
madlib_keras_fit
------------------
(1 row)
Time: 5074.171 ms
{code}
This is pretty verbose output, now should we reduce that?
When we say
INFO: Time for iteration 3: 0.733813047409 sec
...
INFO: Time for evaluation in iteration 3: 0.205695867538 sec.
could we change the first one to
INFO: Time for training in iteration 3: 0.733813047409 sec
to make it clear?
(3)
fit with validation dataset, compute metrics at the end (con't)
{code}
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:33:13.602734
end_training_time | 2019-05-23 14:33:18.299369
time_iter | {"2019-05-23 14:33:17.83783"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.341666668653
training_loss_final | 1.03914785385
training_metrics | {0.341666668653488}
training_loss | {1.03914785385132}
validation_metrics_final | 0.40000000596
validation_loss_final | 1.01280891895
validation_metrics | {0.400000005960464}
validation_loss | {1.01280891895294}
metrics_iters | {3}
{code}
What does
time_iter | {"2019-05-23 14:33:17.83783"}
mean in this case? Why does it not match the end training time?
Also is there a reason why there are a diff number of significant digits on the final vs. the per-iteration array values? Just curious.
(4)
fit with validation dataset, compute metrics every iteration
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
1, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+----------------------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 1
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:43:36.871703
end_training_time | 2019-05-23 14:43:42.707435
time_iter | {"2019-05-23 14:43:39.43415","2019-05-23 14:43:40.708928","2019-05-23 14:43:42.154485"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.683333337307
training_loss_final | 0.824792087078
training_metrics | {0.625,0.675000011920929,0.683333337306976}
training_loss | {0.90629643201828,0.858503997325897,0.824792087078094}
validation_metrics_final | 0.600000023842
validation_loss_final | 0.925064861774
validation_metrics | {0.566666662693024,0.600000023841858,0.600000023841858}
validation_loss | {1.04330325126648,0.973386645317078,0.925064861774445}
metrics_iters | {1,2,3}
{code}
We ran 3 iterations but and there are 2 intervals:
time_iter | {"2019-05-23 14:43:39.43415","2019-05-23 14:43:40.708928","2019-05-23 14:43:42.154485"}
What are these intervals?
(5)
fit with validation dataset, compute metrics at different iteration
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
2, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 2
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:50:08.757182
end_training_time | 2019-05-23 14:50:12.02966
time_iter | {"2019-05-23 14:50:10.390342","2019-05-23 14:50:11.461154"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.324999988079
training_loss_final | 1.40785217285
training_metrics | {0.324999988079071,0.324999988079071}
training_loss | {1.4350677728653,1.40785217285156}
validation_metrics_final | 0.366666674614
validation_loss_final | 1.21352612972
validation_metrics | {0.366666674613953,0.366666674613953}
validation_loss | {1.22771060466766,1.2135261297226}
metrics_iters | {2,3}
{code}
{2,3} looks OK but again not sure what the intervals are in time_iter
(6)
fit with no validation dataset, compute metrics at different iteration
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
NULL, -- validation_table
2, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table |
metrics_compute_frequency | 2
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 14:53:21.993865
end_training_time | 2019-05-23 14:53:26.740577
time_iter | {"2019-05-23 14:53:25.273434","2019-05-23 14:53:26.546298"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.683333337307
training_loss_final | 0.968693137169
training_metrics | {0.683333337306976,0.683333337306976}
training_loss | {0.988789916038513,0.968693137168884}
validation_metrics_final |
validation_loss_final |
validation_metrics |
validation_loss |
metrics_iters | {2,3}
{code}
Looks OK
(7)
a different metric
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['mae'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 15:22:00.135229
end_training_time | 2019-05-23 15:22:04.579288
time_iter | {"2019-05-23 15:22:04.159354"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0.633333325386
training_loss_final | 1.1630614996
training_metrics | {0.633333325386047}
training_loss | {1.16306149959564}
validation_metrics_final | 0.733333349228
validation_loss_final | 0.82923823595
validation_metrics | {0.733333349227905}
validation_loss | {0.82923823595047}
metrics_iters | {3}
{code}
which is a diff value than accuracy in run (2), so OK.
(8)
multiple metrics
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['acc', 'mae'] $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
INFO: Model architecture size: 1KB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Model state (serialized) size: 0MB
CONTEXT: PL/Python function "madlib_keras_fit"
ERROR: plpy.SPIError: plpy.Error: Only at most one metric is supported. (plpython.c:5038) (seg0 slice1 10.128.0.41:40000 pid=24904) (plpython.c:5038)
DETAIL:
Traceback (most recent call last):
PL/Python function "fit_transition", line 6, in <module>
return madlib_keras.fit_transition(**globals())
PL/Python function "fit_transition", line 430, in fit_transition
PL/Python function "fit_transition", line 101, in compile_and_set_weights
PL/Python function "fit_transition", line 288, in compile_model
PL/Python function "fit_transition", line 176, in parse_and_validate_compile_params
PL/Python function "fit_transition", line 190, in _validate_metrics
PL/Python function "fit_transition", line 96, in _assert
PL/Python function "fit_transition"
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit", line 21, in <module>
madlib_keras.fit(**globals())
PL/Python function "madlib_keras_fit", line 198, in fit
PL/Python function "madlib_keras_fit"
{code}
OK
(9)
no metrics
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam' $$, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
INFO: Model architecture size: 1KB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Model state (serialized) size: 0MB
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.539932012558 sec, Total was 0.772794008255 sec (seg0 slice1 10.128.0.41:40000 pid=25033)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.522484779358 sec, Total was 0.755910873413 sec (seg1 slice1 10.128.0.41:40001 pid=25034)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 1: 2.45501494408 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.633363008499 sec, Total was 0.791009902954 sec (seg0 slice1 10.128.0.41:40000 pid=25033)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.633543968201 sec, Total was 0.792724847794 sec (seg1 slice1 10.128.0.41:40001 pid=25034)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 2: 0.794742822647 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Processed 60 images: Fit took 0.50497508049 sec, Total was 0.656916856766 sec (seg0 slice1 10.128.0.41:40000 pid=25033)
CONTEXT: PL/Python function "fit_transition"
INFO: Processed 60 images: Fit took 0.507378101349 sec, Total was 0.660608053207 sec (seg1 slice1 10.128.0.41:40001 pid=25034)
CONTEXT: PL/Python function "fit_transition"
INFO: Time for iteration 3: 0.662581205368 sec
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.18151307106 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set metric after iteration 3: 0.0.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Training set loss after iteration 3: 4.28131914139.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Time for evaluation in iteration 3: 0.180009841919 sec.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set metric after iteration 3: 0.0.
CONTEXT: PL/Python function "madlib_keras_fit"
INFO: Validation set loss after iteration 3: 4.82081604004.
CONTEXT: PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'source_table' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model_summary AS
SELECT
$MAD$iris_train_packed$MAD$::TEXT AS source_table,
$MAD$iris_model$MAD$::TEXT AS model,
$MAD$class_text$MAD$::TEXT AS dependent_varname,
$MAD$attributes$MAD$::TEXT AS independent_varname,
$MAD$model_arch_library$MAD$::TEXT AS model_arch_table,
1::INTEGER AS model_arch_id,
$1 AS compile_params,
$2 AS fit_params,
3::INTEGER AS num_iterations,
$MAD$iris_test_packed$MAD$::TEXT AS validation_table,
3::INTEGER AS metrics_compute_frequency,
$3 AS name,
$4 AS description,
'madlib_keras'::TEXT AS model_type,
47::INTEGER AS model_size,
'2019-05-23 15:06:36.817638'::TIMESTAMP AS start_training_time,
'2019-05-23 15:06:41.302073'::TIMESTAMP AS end_training_time,
$5 AS time_iter,
'1.16-dev'::TEXT AS madlib_version,
3::INTEGER AS num_classes,
$6 AS class_values,
$MAD$character varying$MAD$::TEXT AS dependent_vartype,
1.0::DOUBLE PRECISION AS normalizing_const,
0.0::DOUBLE PRECISION AS training_metrics_final,
4.28131914139::DOUBLE PRECISION AS training_loss_final,
ARRAY[0.0]::DOUBLE PRECISION[] AS training_metrics,
ARRAY[4.2813191413879395]::DOUBLE PRECISION[] AS training_loss,
0.0::DOUBLE PRECISION AS validation_metrics_final,
4.82081604004::DOUBLE PRECISION AS validation_loss_final,
ARRAY[0.0]::DOUBLE PRECISION[] AS validation_metrics,
ARRAY[4.8208160400390625]::DOUBLE PRECISION[] AS validation_loss,
ARRAY[3]::INTEGER[] AS metrics_iters
"
PL/Python function "madlib_keras_fit"
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'model_data' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CONTEXT: SQL statement "
CREATE TABLE iris_model AS SELECT
$1 as model_data,
$2 as model_arch"
PL/Python function "madlib_keras_fit"
-[ RECORD 1 ]----+-
madlib_keras_fit |
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+----------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam'
fit_params | batch_size=16, epochs=1
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 15:24:52.338952
end_training_time | 2019-05-23 15:24:57.002596
time_iter | {"2019-05-23 15:24:56.64148"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0
training_loss_final | 2.9513938427
training_metrics | {0}
training_loss | {2.95139384269714}
validation_metrics_final | 0
validation_loss_final | 2.5171585083
validation_metrics | {0}
validation_loss | {2.51715850830078}
metrics_iters | {3}
{code}
I think we should leave it blank not 0 if there is no metric.
(10)
no fit param
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
$$ loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'] $$, -- compile_params
NULL, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
Time: 4980.306 ms
madlib=# select * from iris_model_summary;
-[ RECORD 1 ]-------------+--------------------------------------------------------------------------
source_table | iris_train_packed
model | iris_model
dependent_varname | class_text
independent_varname | attributes
model_arch_table | model_arch_library
model_arch_id | 1
compile_params | loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
fit_params |
num_iterations | 3
validation_table | iris_test_packed
metrics_compute_frequency | 3
name | Sophia L.
description | Simple MLP model on iris dataset
model_type | madlib_keras
model_size | 47
start_training_time | 2019-05-23 15:28:46.358564
end_training_time | 2019-05-23 15:28:50.987745
time_iter | {"2019-05-23 15:28:50.584773"}
madlib_version | 1.16-dev
num_classes | 3
class_values | {Iris-setosa,Iris-versicolor,Iris-virginica}
dependent_vartype | character varying
normalizing_const | 1
training_metrics_final | 0
training_loss_final | 1.41545331478
training_metrics | {0}
training_loss | {1.41545331478119}
validation_metrics_final | 0
validation_loss_final | 1.2854578495
validation_metrics | {0}
validation_loss | {1.28545784950256}
metrics_iters | {3}
{code}
OK
(11)
no compile params
{code}
DROP TABLE IF EXISTS iris_model, iris_model_summary;
SELECT madlib.madlib_keras_fit('iris_train_packed', -- source_table
'iris_model', -- model
'model_arch_library', -- model_arch_table
1, -- model_arch_id
NULL, -- compile_params
$$ batch_size=16, epochs=1 $$, -- fit_params
3, -- num_iterations
0, -- gpus per host
'iris_test_packed', -- validation_table
3, -- metrics compute frequency
'Sophia L.', -- name
'Simple MLP model on iris dataset' -- description
);
ERROR: TypeError: cannot concatenate 'str' and 'NoneType' objects (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit", line 21, in <module>
madlib_keras.fit(**globals())
PL/Python function "madlib_keras_fit", line 166, in fit
PL/Python function "madlib_keras_fit"
{code}
Does not look like the right error to throw.
> DL: Add support for reporting various metrics in fit/evaluate
> -------------------------------------------------------------
>
> Key: MADLIB-1338
> URL: https://issues.apache.org/jira/browse/MADLIB-1338
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Deep Learning
> Reporter: Nandish Jayaram
> Priority: Major
> Fix For: v1.16
>
>
> The current `madlib_keras.fit()` code reports accuracy as the only metric, along with loss value. But we could ask for different metrics in compile params (`mae`, `binary_accuracy ` etc.), then `Keras.evaluate()` would return back `loss` (by default) and `mean_absolute_error` or `binary_accuracy` (metrics).
> This JIRA requests support to be able to report any one of these metrics in the output table.
> Other requirements:
> 1. Remove training loss/accuracy computation from `fit_transition` and instead use the evaluate function to calculate the training loss/metric. See PR [https://github.com/apache/madlib/pull/388 |https://github.com/apache/madlib/pull/388/files]for more details
> 2. metric param can be optional
> 3. Maybe we should rename all the related output column as metric instead of metrics
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)