You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Qiping Li (JIRA)" <ji...@apache.org> on 2014/08/28 04:49:58 UTC
[jira] [Created] (SPARK-3272) Calculate prediction for nodes
separately from calculating information gain for splits in decision tree
Qiping Li created SPARK-3272:
--------------------------------
Summary: Calculate prediction for nodes separately from calculating information gain for splits in decision tree
Key: SPARK-3272
URL: https://issues.apache.org/jira/browse/SPARK-3272
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.0.2
Reporter: Qiping Li
Fix For: 1.1.0
In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is determined, no matter what the splits are.
To save computation, we can first calculate prediction first and then calculate information gain stats for each split.
This is also necessary if we want to support minimum instances per node parameters([SPARK-2207|https://issues.apache.org/jira/browse/SPARK-2207]) because when all splits don't satisfy minimum instances requirement , we don't use information gain of any splits. There should be a way to get the prediction value.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org