You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Julian King (Jira)" <ji...@apache.org> on 2021/03/02 00:23:00 UTC

[jira] [Commented] (SPARK-3159) Check for reducible DecisionTree

    [ https://issues.apache.org/jira/browse/SPARK-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293259#comment-17293259 ] 

Julian King commented on SPARK-3159:
------------------------------------

I also need the probability estimates for the tree, not the classifier output.

Does the code (after the accepted PR) means that nodes will always be merged if the classification output is the same? This radically reduces the utility of decision trees for insight generation. 

We are encountering a situation where the decision tree refuses to split even a single node in situations where it should, and are wondering whether it relates to this behaviour.

Is there any way to disable this? [~asolimando]

> Check for reducible DecisionTree
> --------------------------------
>
>                 Key: SPARK-3159
>                 URL: https://issues.apache.org/jira/browse/SPARK-3159
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Alessandro Solimando
>            Priority: Minor
>             Fix For: 2.4.0
>
>         Attachments: image-2020-05-24-23-00-38-419.png
>
>
> Improvement: test-time computation
> Currently, pairs of leaf nodes with the same parent can both output the same prediction.  This happens since the splitting criterion (e.g., Gini) is not the same as prediction accuracy/MSE; the splitting criterion can sometimes be improved even when both children would still output the same prediction (e.g., based on the majority label for classification).
> We could check the tree and reduce it if possible after training.
> Note: This happens with scikit-learn as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org