You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/10/20 22:13:33 UTC

[jira] [Updated] (SPARK-3207) Choose splits for continuous features in DecisionTree more adaptively

     [ https://issues.apache.org/jira/browse/SPARK-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng updated SPARK-3207:
---------------------------------
    Assignee: Qiping Li

> Choose splits for continuous features in DecisionTree more adaptively
> ---------------------------------------------------------------------
>
>                 Key: SPARK-3207
>                 URL: https://issues.apache.org/jira/browse/SPARK-3207
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Qiping Li
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> DecisionTree splits on continuous features by choosing an array of values from a subsample of the data.
> Currently, it does not check for identical values in the subsample, so it could end up having multiple copies of the same split.  This is not an error, but it could be improved to be more adaptive to the data.
> Proposal: In findSplitsBins, check for identical values, and do some searching in order to find a set of unique splits.  Reduce the number of splits if there are not enough unique candidates.
> This would require modifying findSplitsBins and making sure that the number of splits/bins (chosen adaptively) is set correctly elsewhere in the code (such as in DecisionTreeMetadata).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org