You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/03/22 17:47:00 UTC

[jira] [Created] (MADLIB-1219) FR: null_as_category=TRUE not working when variable importance used

Frank McQuillan created MADLIB-1219:
---------------------------------------

             Summary: FR:  null_as_category=TRUE not working when variable importance used
                 Key: MADLIB-1219
                 URL: https://issues.apache.org/jira/browse/MADLIB-1219
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Random Forest
            Reporter: Frank McQuillan
             Fix For: v1.14


I cannot get null_as_category=TRUE to work.

{code}
DROP TABLE IF EXISTS null_handling_example;

CREATE TABLE null_handling_example (
    id integer,
    country text,
    city text,
    weather text,
    response text
);
 
INSERT INTO null_handling_example VALUES
(1,null,null,null,'a'),
(2,'US',null,null,'b'),
(3,'US','NY',null,'c'),
(4,'US','NY','rainy','d');
 
SELECT * FROM null_handling_example ORDER BY id;
(code}

{code}
DROP TABLE IF EXISTS train_output, train_output_group, train_output_summary;

SELECT madlib.forest_train('null_handling_example',  -- source table
                           'train_output',    -- output model table
                           'id',              -- id column
                           'response',        -- response
                           'country, weather, city',   -- features
                           NULL,              -- exclude columns
                           NULL,              -- grouping columns
                           2::integer,        -- number of trees
                           2::integer,        -- number of random features
                           TRUE::boolean,     -- variable importance
                           1::integer,        -- num_permutations
                           3::integer,        -- max depth
                           2::integer,        -- min split
                           2::integer,        -- min bucket
                           2::integer,        -- number of splits per continuous variable
                           'null_as_category=TRUE'
                           );
{code}
produces this error
{code}
ERROR:  plpy.SPIError: invalid array length
DETAIL:  array_of_float: Size should be in [1, 1e7], 0 given
CONTEXT:  Traceback (most recent call last):
  PL/Python function "forest_train", line 42, in <module>
    sample_ratio
  PL/Python function "forest_train", line 609, in forest_train
  PL/Python function "forest_train", line 1058, in _calculate_oob_prediction
PL/Python function "forest_train"
{code}

When variable importance is FALSE, it does not produce this error





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)