You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/04/10 20:02:00 UTC

[jira] [Created] (MADLIB-1322) MLP with minibatch fails for integer dependent variable

Frank McQuillan created MADLIB-1322:
---------------------------------------

             Summary: MLP with minibatch fails for integer dependent variable
                 Key: MADLIB-1322
                 URL: https://issues.apache.org/jira/browse/MADLIB-1322
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Neural Networks
            Reporter: Frank McQuillan
             Fix For: v1.16



(1)
If I have an integer dependent variable and I mini-batch:

{code}
select madlib.minibatch_preprocessor(
'classification_train', -- input table
'mini_batch_packed_train', -- output table
'response', -- response INTEGER
'feature_vector',  -- indep vars
NULL, -- grouping
NULL, -- buffer size (or size of the mini-batch)
TRUE -- Encode scalar int dependent variable (if response is integer instead of boolean or char)
);
{code}

Then the table looks like:

{code}
madlib=# \d+ batch_packed_train_summary
             Table "public.mini_batch_packed_train_summary"
          Column          |   Type    | Modifiers | Storage  | Stats target | Description 
--------------------------+-----------+-----------+----------+--------------+-------------
 source_table             | text      |           | extended |              | 
 output_table             | text      |           | extended |              | 
 dependent_varname        | text      |           | extended |              | 
 independent_varname      | text      |           | extended |              | 
 dependent_vartype        | text      |           | extended |              | 
 buffer_size              | integer   |           | plain    |              | 
 class_values             | integer[] |           | extended |              | 
 num_rows_processed       | integer   |           | plain    |              | 
 num_missing_rows_skipped | integer   |           | plain    |              | 
 grouping_cols            | text      |           | extended |              | 
{code}

Then MLP classification fails with:

{code}
InternalError: (psycopg2.InternalError) TypeError: must be string, not int
CONTEXT:  Traceback (most recent call last):
  PL/Python function "mlp_classification", line 33, in <module>
    grouping_col)
  PL/Python function "mlp_classification", line 42, in wrapper
  PL/Python function "mlp_classification", line 147, in mlp
  PL/Python function "mlp_classification", line 74, in quote_literal
{code}


(2)
If I cast to text explicitly:

{code}
select madlib.minibatch_preprocessor(
'classification_train', -- input table
'mini_batch_packed_train', -- output table
'response::TEXT', -- response
'feature_vector',  -- indep vars
NULL, -- grouping
NULL, -- buffer size (or size of the mini-batch)
TRUE -- Encode scalar int dependent variable (if response is integer instead of boolean or char)
);
{code}

The tables looks like:

{code}
madlib=# \d+ mini_batch_packed_train_summary
            Table "public.mini_batch_packed_train_summary"
          Column          |  Type   | Modifiers | Storage  | Stats target | Description 
--------------------------+---------+-----------+----------+--------------+-------------
 source_table             | text    |           | extended |              | 
 output_table             | text    |           | extended |              | 
 dependent_varname        | text    |           | extended |              | 
 independent_varname      | text    |           | extended |              | 
 dependent_vartype        | text    |           | extended |              | 
 buffer_size              | integer |           | plain    |              | 
 class_values             | text[]  |           | extended |              | 
 num_rows_processed       | integer |           | plain    |              | 
 num_missing_rows_skipped | integer |           | plain    |              | 
 grouping_cols            | text    |           | extended |              | 
{code}

And MLP training works OK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)