You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/03/22 17:47:00 UTC
[jira] [Created] (MADLIB-1219) FR: null_as_category=TRUE not
working when variable importance used
Frank McQuillan created MADLIB-1219:
---------------------------------------
Summary: FR: null_as_category=TRUE not working when variable importance used
Key: MADLIB-1219
URL: https://issues.apache.org/jira/browse/MADLIB-1219
Project: Apache MADlib
Issue Type: Bug
Components: Module: Random Forest
Reporter: Frank McQuillan
Fix For: v1.14
I cannot get null_as_category=TRUE to work.
{code}
DROP TABLE IF EXISTS null_handling_example;
CREATE TABLE null_handling_example (
id integer,
country text,
city text,
weather text,
response text
);
INSERT INTO null_handling_example VALUES
(1,null,null,null,'a'),
(2,'US',null,null,'b'),
(3,'US','NY',null,'c'),
(4,'US','NY','rainy','d');
SELECT * FROM null_handling_example ORDER BY id;
(code}
{code}
DROP TABLE IF EXISTS train_output, train_output_group, train_output_summary;
SELECT madlib.forest_train('null_handling_example', -- source table
'train_output', -- output model table
'id', -- id column
'response', -- response
'country, weather, city', -- features
NULL, -- exclude columns
NULL, -- grouping columns
2::integer, -- number of trees
2::integer, -- number of random features
TRUE::boolean, -- variable importance
1::integer, -- num_permutations
3::integer, -- max depth
2::integer, -- min split
2::integer, -- min bucket
2::integer, -- number of splits per continuous variable
'null_as_category=TRUE'
);
{code}
produces this error
{code}
ERROR: plpy.SPIError: invalid array length
DETAIL: array_of_float: Size should be in [1, 1e7], 0 given
CONTEXT: Traceback (most recent call last):
PL/Python function "forest_train", line 42, in <module>
sample_ratio
PL/Python function "forest_train", line 609, in forest_train
PL/Python function "forest_train", line 1058, in _calculate_oob_prediction
PL/Python function "forest_train"
{code}
When variable importance is FALSE, it does not produce this error
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)