You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Rahul Iyer (JIRA)" <ji...@apache.org> on 2016/10/24 17:22:58 UTC

[jira] [Commented] (MADLIB-1029) Decision Tree's output summary table does not contain the right list independent variables

    [ https://issues.apache.org/jira/browse/MADLIB-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602627#comment-15602627 ] 

Rahul Iyer commented on MADLIB-1029:
------------------------------------

I'm not able to reproduce the error (possibly since I don't have the exact dataset - abalone dataset on UCI does not have a {{color}} feature): 

{code}
madlib-pg94=# select * from adaboost_output_test_summary;
-[ RECORD 1 ]---------+--------------------------------------------------------------------------------------------------------------------------------------
method                | tree_train
is_classification     | t
source_table          | abalone
model_table           | adaboost_output_test
id_col_name           | id
dependent_varname     | sex
independent_varnames  | rings, length, diameter, height, whole, shucked, viscera, shell
cat_features          | rings
con_features          | length,diameter,height,whole,shucked,viscera,shell
grouping_cols         |
num_all_groups        | 1
num_failed_groups     | 0
total_rows_processed  | 4177
total_rows_skipped    | 0
dependent_var_levels  | "F","I","M"
dependent_var_type    | text
input_cp              | 0.01
independent_var_types | integer, double precision, double precision, double precision, double precision, double precision, double precision, double precision
{code}

> Decision Tree's output summary table does not contain the right list independent variables
> ------------------------------------------------------------------------------------------
>
>                 Key: MADLIB-1029
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1029
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Decision Tree
>            Reporter: April Song
>
> Decision Tree's output summary table does not contain the right list independent variables. 
> Steps to reproduce:
> select madlib.tree_train('abalone_2',         -- source table
>                          'adaboost_output_test',    -- output model table
>                          'rowid',              -- id column
>                          'sex',           -- response
>                          'length,diam,height,whole,shucked,viscera,shell,rings,color',   -- features
>                          NULL::text,        -- exclude columns
>                          'gini',            -- split criterion
>                          NULL::text,        -- no grouping
>                          NULL::text,        -- no weights
>                          5,                 -- max depth
>                          3,                 -- min split
>                          1,                 -- min bucket
>                          10
>                          );
> gpadmin=# select * from adaboost_output_test_summary;
> -[ RECORD 1 ]---------+---------------------
> method                | tree_train
> is_classification     | t
> source_table          | abalone_2
> model_table           | adaboost_output_test
> id_col_name           | rowid
> dependent_varname     | sex
> independent_varnames  | color
> cat_features          | color
> con_features          | 
> grouping_cols         | 
> num_all_groups        | 1
> num_failed_groups     | 0
> total_rows_processed  | 2835
> total_rows_skipped    | 0
> dependent_var_levels  | "0","1"
> dependent_var_type    | integer
> input_cp              | 0.01
> independent_var_types | text
> Abalone data can be found here: https://archive.ics.uci.edu/ml/datasets/Abalone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)