You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2014/08/28 21:02:08 UTC

[jira] [Commented] (SPARK-3272) Calculate prediction for nodes separately from calculating information gain for splits in decision tree

    [ https://issues.apache.org/jira/browse/SPARK-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114166#comment-14114166 ] 

Joseph K. Bradley commented on SPARK-3272:
------------------------------------------

With respect to [SPARK-2207], I think this JIRA may or may not be necessary for implementing [SPARK-2207], depending on how the code is set up.  For [SPARK-2207], I imagined checking the number of instances and the information gain when the Node is constructed in the main loop (in the train() method).  If there are too few instances or too little information gain, then the Node will be set as a leaf.  We could potentially avoid the aggregation for those leafs, but I would consider that a separate issue ([SPARK-3158]).

> Calculate prediction for nodes separately from calculating information gain for splits in decision tree
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3272
>                 URL: https://issues.apache.org/jira/browse/SPARK-3272
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.0.2
>            Reporter: Qiping Li
>             Fix For: 1.1.0
>
>
> In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is determined, no matter what the splits are.
> To save computation, we can first calculate prediction first and then calculate information gain stats for each split.
> This is also necessary if we want to support minimum instances per node parameters([SPARK-2207|https://issues.apache.org/jira/browse/SPARK-2207]) because when all splits don't satisfy minimum instances requirement , we don't use information gain of any splits. There should be a way to get the prediction value.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org