You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/15 21:17:34 UTC
[jira] [Commented] (MADLIB-975) Random Forest training error
message
[ https://issues.apache.org/jira/browse/MADLIB-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196149#comment-15196149 ]
ASF GitHub Bot commented on MADLIB-975:
---------------------------------------
GitHub user orhankislal opened a pull request:
https://github.com/apache/incubator-madlib/pull/32
Recursive_partitioning: Fix random forrest error message
JIRA: MADLIB-975
Random forest train gave a non-descript error if every data point had a NULL feature. Added an if check and and assert to give a proper description of the situation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/orhankislal/incubator-madlib bugfix/random_forest
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/32.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #32
----
commit 795723a421276b9c46aa7b90c096cd5ddfd5bc12
Author: Orhan Kislal <ok...@pivotal.io>
Date: 2016-03-10T19:30:03Z
Recursive_partitioning: Fix random forrest error message
JIRA: MADLIB-975
Random forest train gave a non-descript error if every data point had a NULL feature. Added an if check and and assert to give a proper description of the situation.
----
> Random Forest training error message
> ------------------------------------
>
> Key: MADLIB-975
> URL: https://issues.apache.org/jira/browse/MADLIB-975
> Project: Apache MADlib
> Issue Type: Bug
> Reporter: Rashmi Raghu
> Assignee: Orhan Kislal
> Priority: Minor
> Fix For: v1.9
>
>
> Error message during RF training not interpretable. See query example below. Sample data for query sent offline.
> SELECT madlib.forest_train(
> 'dev.training_data_take_1', --training_table_name
> 'dev.models_random_forest', -- output_table_name,
> 'id', -- id_col_name,
> 'event', -- dependent_variable,
> '*', -- list_of_features,
> 'id,regionname,wellid,day', -- list_of_features_to_exclude,
> NULL, -- grouping_cols,
> 100 -- num_trees,
> -- num_random_features,
> -- importance,
> -- num_permutations,
> -- max_tree_depth,
> -- min_split,
> -- min_bucket,
> -- num_splits,
> -- surrogate_params,
> -- verbose,
> -- sample_ratio
> );
> ERROR: AttributeError: 'NoneType' object has no attribute 'sort' (plpython.c:4648)
> CONTEXT: Traceback (most recent call last):
> PL/Python function "forest_train", line 42, in <module>
> sample_ratio
> PL/Python function "forest_train", line 337, in forest_train
> PL/Python function "forest_train"
> ********** Error **********
> ERROR: AttributeError: 'NoneType' object has no attribute 'sort' (plpython.c:4648)
> SQL state: XX000
> Context: Traceback (most recent call last):
> PL/Python function "forest_train", line 42, in <module>
> sample_ratio
> PL/Python function "forest_train", line 337, in forest_train
> PL/Python function "forest_train"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)