You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Taylor Baldwin (JIRA)" <ji...@apache.org> on 2016/06/16 17:52:05 UTC

[jira] [Created] (SPARK-15995) Gradient Boosted Trees - handling of Categorical Inputs

Taylor Baldwin created SPARK-15995:
--------------------------------------

             Summary: Gradient Boosted Trees - handling of Categorical Inputs
                 Key: SPARK-15995
                 URL: https://issues.apache.org/jira/browse/SPARK-15995
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 1.6.1
            Reporter: Taylor Baldwin


Gradient Boosted trees appear to handle all inputs as continuous, or at least ordered, values.  The trees returned in the Gradient Boosted model have nodes for categorical values containing a split that operates on the threshold not the categories value.  This treats categorical values as if the ordering of the values is significant, which is not reasonable to assume.

Both Random Forest and Decision Trees accept the map for categorical features info, while Gradient Boosted trees do not.  Random Forest and Decision trees provide nodes for categorical values that have split with the categories populated.  

According to the website documentation, Gradient Boosted trees should handle categorical features yet there is no perceivable way to provide the categorical information to enable handling them as categories not continuous values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org