You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yangqiao (JIRA)" <ji...@apache.org> on 2015/06/03 15:28:38 UTC
[jira] [Created] (SPARK-8078) Spark MLlib Decision Trees
Improvement
yangqiao created SPARK-8078:
-------------------------------
Summary: Spark MLlib Decision Trees Improvement
Key: SPARK-8078
URL: https://issues.apache.org/jira/browse/SPARK-8078
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0, 1.1.1, 1.1.0, 1.0.2, 1.0.1
Environment: ubuntu14.04
Reporter: yangqiao
Priority: Minor
Fix For: 1.3.1
In Spark MLlib, Decision Trees use Gini impurity, Entropy and Variance as impurity. The Entropy impurity implement by calculating the Info Gain, which is put forward by J. Ross Quinlan in ID3 algorithm. And it can be improved by implementing C4.5 algorithm,which using Info Gain Ratio instead of Info Gain to calculate impurity. By implementing C4.5 algorithm, the Decision Trees model can achieve higher forecast accuracy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org