You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/03/22 17:47:00 UTC

[jira] [Updated] (MADLIB-1219) RF: null_as_category=TRUE not working when variable importance used

     [ https://issues.apache.org/jira/browse/MADLIB-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan updated MADLIB-1219:
------------------------------------
    Summary: RF:  null_as_category=TRUE not working when variable importance used  (was: FR:  null_as_category=TRUE not working when variable importance used)

> RF:  null_as_category=TRUE not working when variable importance used
> --------------------------------------------------------------------
>
>                 Key: MADLIB-1219
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1219
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Random Forest
>            Reporter: Frank McQuillan
>            Assignee: Rahul Iyer
>            Priority: Major
>             Fix For: v1.14
>
>
> I cannot get null_as_category=TRUE to work.
> {code}
> DROP TABLE IF EXISTS null_handling_example;
> CREATE TABLE null_handling_example (
>     id integer,
>     country text,
>     city text,
>     weather text,
>     response text
> );
>  
> INSERT INTO null_handling_example VALUES
> (1,null,null,null,'a'),
> (2,'US',null,null,'b'),
> (3,'US','NY',null,'c'),
> (4,'US','NY','rainy','d');
>  
> SELECT * FROM null_handling_example ORDER BY id;
> (code}
> {code}
> DROP TABLE IF EXISTS train_output, train_output_group, train_output_summary;
> SELECT madlib.forest_train('null_handling_example',  -- source table
>                            'train_output',    -- output model table
>                            'id',              -- id column
>                            'response',        -- response
>                            'country, weather, city',   -- features
>                            NULL,              -- exclude columns
>                            NULL,              -- grouping columns
>                            2::integer,        -- number of trees
>                            2::integer,        -- number of random features
>                            TRUE::boolean,     -- variable importance
>                            1::integer,        -- num_permutations
>                            3::integer,        -- max depth
>                            2::integer,        -- min split
>                            2::integer,        -- min bucket
>                            2::integer,        -- number of splits per continuous variable
>                            'null_as_category=TRUE'
>                            );
> {code}
> produces this error
> {code}
> ERROR:  plpy.SPIError: invalid array length
> DETAIL:  array_of_float: Size should be in [1, 1e7], 0 given
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "forest_train", line 42, in <module>
>     sample_ratio
>   PL/Python function "forest_train", line 609, in forest_train
>   PL/Python function "forest_train", line 1058, in _calculate_oob_prediction
> PL/Python function "forest_train"
> {code}
> When variable importance is FALSE, it does not produce this error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)