You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/03/22 17:47:00 UTC
[jira] [Assigned] (MADLIB-1219) FR: null_as_category=TRUE not
working when variable importance used
[ https://issues.apache.org/jira/browse/MADLIB-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan reassigned MADLIB-1219:
---------------------------------------
Assignee: Rahul Iyer
> FR: null_as_category=TRUE not working when variable importance used
> --------------------------------------------------------------------
>
> Key: MADLIB-1219
> URL: https://issues.apache.org/jira/browse/MADLIB-1219
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Random Forest
> Reporter: Frank McQuillan
> Assignee: Rahul Iyer
> Priority: Major
> Fix For: v1.14
>
>
> I cannot get null_as_category=TRUE to work.
> {code}
> DROP TABLE IF EXISTS null_handling_example;
> CREATE TABLE null_handling_example (
> id integer,
> country text,
> city text,
> weather text,
> response text
> );
>
> INSERT INTO null_handling_example VALUES
> (1,null,null,null,'a'),
> (2,'US',null,null,'b'),
> (3,'US','NY',null,'c'),
> (4,'US','NY','rainy','d');
>
> SELECT * FROM null_handling_example ORDER BY id;
> (code}
> {code}
> DROP TABLE IF EXISTS train_output, train_output_group, train_output_summary;
> SELECT madlib.forest_train('null_handling_example', -- source table
> 'train_output', -- output model table
> 'id', -- id column
> 'response', -- response
> 'country, weather, city', -- features
> NULL, -- exclude columns
> NULL, -- grouping columns
> 2::integer, -- number of trees
> 2::integer, -- number of random features
> TRUE::boolean, -- variable importance
> 1::integer, -- num_permutations
> 3::integer, -- max depth
> 2::integer, -- min split
> 2::integer, -- min bucket
> 2::integer, -- number of splits per continuous variable
> 'null_as_category=TRUE'
> );
> {code}
> produces this error
> {code}
> ERROR: plpy.SPIError: invalid array length
> DETAIL: array_of_float: Size should be in [1, 1e7], 0 given
> CONTEXT: Traceback (most recent call last):
> PL/Python function "forest_train", line 42, in <module>
> sample_ratio
> PL/Python function "forest_train", line 609, in forest_train
> PL/Python function "forest_train", line 1058, in _calculate_oob_prediction
> PL/Python function "forest_train"
> {code}
> When variable importance is FALSE, it does not produce this error
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)