You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Qiping Li (JIRA)" <ji...@apache.org> on 2014/08/28 04:49:58 UTC

[jira] [Created] (SPARK-3272) Calculate prediction for nodes separately from calculating information gain for splits in decision tree

Qiping Li created SPARK-3272:
--------------------------------

             Summary: Calculate prediction for nodes separately from calculating information gain for splits in decision tree
                 Key: SPARK-3272
                 URL: https://issues.apache.org/jira/browse/SPARK-3272
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.0.2
            Reporter: Qiping Li
             Fix For: 1.1.0


In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is determined, no matter what the splits are.
To save computation, we can first calculate prediction first and then calculate information gain stats for each split.

This is also necessary if we want to support minimum instances per node parameters([SPARK-2207|https://issues.apache.org/jira/browse/SPARK-2207]) because when all splits don't satisfy minimum instances requirement , we don't use information gain of any splits. There should be a way to get the prediction value.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org