You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/08/01 05:54:38 UTC

[jira] [Resolved] (SPARK-2756) Decision Tree bugs

     [ https://issues.apache.org/jira/browse/SPARK-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng resolved SPARK-2756.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1.0

Issue resolved by pull request 1673
[https://github.com/apache/spark/pull/1673]

> Decision Tree bugs
> ------------------
>
>                 Key: SPARK-2756
>                 URL: https://issues.apache.org/jira/browse/SPARK-2756
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>             Fix For: 1.1.0
>
>
> 3 bugs:
> Bug 1: Indexing is inconsistent for aggregate calculations for unordered features (in multiclass classification with categorical features, where the features had few enough values such that they could be considered unordered, i.e., isSpaceSufficientForAllCategoricalSplits=true).
> * updateBinForUnorderedFeature indexed agg as (node, feature, featureValue, binIndex), where
> ** featureValue was from arr (so it was a feature value)
> ** binIndex was in [0,…, 2^(maxFeatureValue-1)-1)
> * The rest of the code indexed agg as (node, feature, binIndex, label).
> Bug 2: calculateGainForSplit (for classification):
> * It returns dummy prediction values when either the right or left children had 0 weight.  These are incorrect for multiclass classification.
> Bug 3: Off-by-1 when finding thresholds for splits for continuous features.
> * When finding thresholds for possible splits for continuous features in DecisionTree.findSplitsBins, the thresholds were set according to individual training examples’ feature values.  This can cause problems for small datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)