You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yangqiao (JIRA)" <ji...@apache.org> on 2015/06/03 15:28:38 UTC

[jira] [Created] (SPARK-8078) Spark MLlib Decision Trees Improvement

yangqiao created SPARK-8078:
-------------------------------

             Summary: Spark MLlib Decision Trees Improvement
                 Key: SPARK-8078
                 URL: https://issues.apache.org/jira/browse/SPARK-8078
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0, 1.1.1, 1.1.0, 1.0.2, 1.0.1
         Environment: ubuntu14.04
            Reporter: yangqiao
            Priority: Minor
             Fix For: 1.3.1


In Spark MLlib, Decision Trees use Gini impurity, Entropy and Variance as impurity. The Entropy impurity implement by calculating the Info Gain,  which is put forward by J. Ross Quinlan in ID3 algorithm. And it can be improved by implementing C4.5 algorithm,which using Info Gain Ratio instead of Info Gain to calculate impurity. By implementing C4.5 algorithm, the Decision Trees model can achieve higher forecast accuracy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org