You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Rahul Tanwani (JIRA)" <ji...@apache.org> on 2016/01/12 10:19:39 UTC

[jira] [Created] (SPARK-12773) Impurity and Sample details for each node of a decision tree

Rahul Tanwani created SPARK-12773:
-------------------------------------

             Summary: Impurity and Sample details for each node of a decision tree
                 Key: SPARK-12773
                 URL: https://issues.apache.org/jira/browse/SPARK-12773
             Project: Spark
          Issue Type: Question
          Components: ML, MLlib
    Affects Versions: 1.5.2
            Reporter: Rahul Tanwani


I just want to understand if each node in the decision tree calculates / stores information about no. of samples that satisfy the split criteria. Looking at the code, I find some information about the impurity statistics but did not find anything on the samples. Sci-kit learn exposes both of these metrics. The information may help in the cases where there are multiple decision rules (multiple leaf nodes) yielding the same prediction and we want to do some relative comparisions of decision paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org