You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2017/08/16 19:03:00 UTC
[jira] [Comment Edited] (MADLIB-413) Neural Networks - MLP - Phase
1
[ https://issues.apache.org/jira/browse/MADLIB-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129274#comment-16129274 ]
Frank McQuillan edited comment on MADLIB-413 at 8/16/17 7:02 PM:
-----------------------------------------------------------------
Notes on NN phases 1 and 2
1, 3, 4, 5, 6, 7, 8, 9, 10, 11 from above are OK now.
Still an issue with 2:
(2)
If I select all defaults
{code}
SELECT madlib.mlp_classification(
'iris_data', -- Source table
'mlp_model', -- Destination table
'attributes', -- Input features
'class_text' -- Label
);
{code}
I get an error
{code}
(psycopg2.ProgrammingError) function madlib.mlp_classification(unknown, unknown, unknown, unknown) does not exist
LINE 1: SELECT madlib.mlp_classification(
^
HINT: No function matches the given name and argument types. You may need to add explicit type casts.
[SQL: "SELECT madlib.mlp_classification(\n 'iris_data', -- Source table\n 'mlp_model', -- Destination table\n 'attributes', -- Input features\n 'class_text' -- Label\n);"]
{code}
Seems like hidden_layer_sizes should be optional. If I set it to NULL as in:
{code}
SELECT madlib.mlp_classification(
'iris_data', -- Source table
'mlp_model', -- Destination table
'attributes', -- Input features
'class_text’, -- Label
NULL
);
{code}
then it does work, but seems odd to force user to add a NULL at the end if they want the default.
Currently the default is no hidden layers. Should be 1 hidden layer with n units?
Please check scikit-learn or other for reasonable default n.
And update user docs with the value that you choose.
(3)
Just a comment. If I use the defaults
{code}
'step_size=0.001,
n_iterations=100,
tolerance=0.001', -- Optimizer params
'sigmoid'); -- Activation function
{code}
on full Iris data set (150 rows) I get 34 missclassifications with 1 hidden layer/5 units when I use the same data for prediction that I used for training. This is better than the last time I tested which had 100 missclassifications.
If I run 1000 iterations I get 13 missclassifications.
If I run 10000 iterations I get 2 missclassifications.
So seems fine.
12)
As you add hidden layers you need to run more iterations for the loss to come down. This is expected behavior
hidden units | iterations | loss | missclassifications
[50] | 100 | 1.27 | 34
[50,50] | 100 | 1.89 | 100
[50,50] | 10000 | 0.08 | 2
13)
Warm start seems OK. On full iris data set
{code}
SELECT madlib.mlp_classification(
'iris_data_full', -- Source table
'mlp_model', -- Destination table
'attributes', -- Input features
'class_text', -- Label
ARRAY[50], -- Number of units per layer
'learning_rate_init=0.001,
n_iterations=300,
tolerance=0.000', -- Optimizer params
'sigmoid', -- Activation function
NULL, -- Weights
FALSE, -- Warm start
TRUE); -- Verbose
{code}
If I run 1 run for 300 iterations I get loss going from 1.90 to 0.625
If I run 3 runs for 100 iterations with warm_start=TRUE I get loss going from 1.94 to 1.24, then 1.23 to 0.826, then 0.82 to 0.65
So this seems OK
14) Please re-check on-line help and update as required.
15) check the design doc is up to date and reflects the implementation here.
16) I made a number of user doc changes. Please see the attached mlp.sql_in.
Can you proof read it and make any needed mods?
I thought I'd attach it to this JIRA rather than do a PR since I you will need to add changes for default hidden layers (#2 above)
———————
So in summary I guess the remaining work is 2, 14, 15, 16.
Otherwise looks good!
was (Author: fmcquillan):
Notes on NN phases 1 and 2
1, 3, 4, 5, 6, 7, 8, 9, 10, 11 from above are OK now.
Still an issue with 2:
(2)
If I select all defaults
{code}
SELECT madlib.mlp_classification(
'iris_data', -- Source table
'mlp_model', -- Destination table
'attributes', -- Input features
'class_text' -- Label
);
{code}
I get an error
{code}
(psycopg2.ProgrammingError) function madlib.mlp_classification(unknown, unknown, unknown, unknown) does not exist
LINE 1: SELECT madlib.mlp_classification(
^
HINT: No function matches the given name and argument types. You may need to add explicit type casts.
[SQL: "SELECT madlib.mlp_classification(\n 'iris_data', -- Source table\n 'mlp_model', -- Destination table\n 'attributes', -- Input features\n 'class_text' -- Label\n);"]
{code}
Seems like hidden_layer_sizes should be optional. If I set it to NULL as in:
{code}
SELECT madlib.mlp_classification(
'iris_data', -- Source table
'mlp_model', -- Destination table
'attributes', -- Input features
'class_text’, -- Label
NULL
);
{code}
then it does work, but seems odd to force user to add a NULL at the end if they want the default.
Currently the default is no hidden layers. Should be 1 hidden layer with n units?
Please check scikit-learn or other for reasonable default n.
And update user docs with the value that you choose.
(3)
Just a comment. If I use the defaults
{code}
'step_size=0.001,
n_iterations=100,
tolerance=0.001', -- Optimizer params
'sigmoid'); -- Activation function
{code}
on full Iris data set (150 rows) I get 34 missclassifications with 1 hidden layer/5 units when I use the same data for prediction that I used for training. This is better than the last time I tested which had 100 missclassifications.
If I run 1000 iterations I get 13 missclassifications.
If I run 10000 iterations I get 2 missclassifications.
So seems fine.
12)
As you add hidden layers you need to run more iterations for the loss to come down. This is expected behavior
hidden units | iterations | loss | missclassifications
[50] | 100 | 1.27 | 34
[50,50] | 100 | 1.89 | 100
[50,50] | 10000 | 0.08 | 2
13)
Warm start seems OK. On full iris data set
{code}
SELECT madlib.mlp_classification(
'iris_data_full', -- Source table
'mlp_model', -- Destination table
'attributes', -- Input features
'class_text', -- Label
ARRAY[50], -- Number of units per layer
'learning_rate_init=0.001,
n_iterations=300,
tolerance=0.000', -- Optimizer params
'sigmoid', -- Activation function
NULL, -- Weights
FALSE, -- Warm start
TRUE); -- Verbose
{code}
If I run 1 run for 300 iterations I get loss going from 1.90 to 0.625
If I run 3 runs for 100 iterations with warm_start=TRUE I get loss going from 1.94 to 1.24, then 1.23 to 0.826, then 0.82 to 0.65
So this seems OK
14) Please re-check on-line help and update as required.
15) check the design doc is up to date and reflects the implementation here.
———————
So in summary I guess the remaining work is 2, 14, 15.
Otherwise looks good!
> Neural Networks - MLP - Phase 1
> -------------------------------
>
> Key: MADLIB-413
> URL: https://issues.apache.org/jira/browse/MADLIB-413
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Neural Networks
> Reporter: Caleb Welton
> Assignee: Cooper Sloan
> Fix For: v1.12
>
> Attachments: screenshot-1.png
>
>
> Multilayer perceptron with backpropagation
> Modules:
> * mlp_classification
> * mlp_regression
> Interface
> {code}
> source_table VARCHAR
> output_table VARCHAR
> independent_varname VARCHAR -- Column name for input features, should be a Real Valued array
> dependent_varname VARCHAR, -- Column name for target values, should be Real Valued array of size 1 or greater
> hidden_layer_sizes INTEGER[], -- Number of units per hidden layer (can be empty or null, in which case, no hidden layers)
> optimizer_params VARCHAR, -- Specified below
> weights VARCHAR, -- Column name for weights. Weights the loss for each input vector. Column should contain positive real value
> activation_function VARCHAR, -- One of 'sigmoid' (default), 'tanh', 'relu', or any prefix (eg. 't', 's')
> grouping_cols
> )
> {code}
> where
> {code}
> optimizer_params: -- eg "step_size=0.5, n_tries=5"
> {
> step_size DOUBLE PRECISION, -- Learning rate
> n_iterations INTEGER, -- Number of iterations per try
> n_tries INTEGER, -- Total number of training cycles, with random initializations to avoid local minima.
> tolerance DOUBLE PRECISION, -- Maximum distance between weights before training stops (or until it reaches n_iterations)
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)