You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/06/03 15:38:38 UTC

[jira] [Assigned] (SPARK-8078) Spark MLlib Decision Trees Improvement

     [ https://issues.apache.org/jira/browse/SPARK-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-8078:
-----------------------------------

    Assignee:     (was: Apache Spark)

> Spark MLlib Decision Trees Improvement
> --------------------------------------
>
>                 Key: SPARK-8078
>                 URL: https://issues.apache.org/jira/browse/SPARK-8078
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.0.1, 1.0.2, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.2.2, 1.3.0, 1.3.1
>         Environment: ubuntu14.04
>            Reporter: yangqiao
>            Priority: Minor
>              Labels: performance
>             Fix For: 1.3.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In Spark MLlib, Decision Trees use Gini impurity, Entropy and Variance as impurity. The Entropy impurity implement by calculating the Info Gain,  which is put forward by J. Ross Quinlan in ID3 algorithm. And it can be improved by implementing C4.5 algorithm,which using Info Gain Ratio instead of Info Gain to calculate impurity. By implementing C4.5 algorithm, the Decision Trees model can achieve higher forecast accuracy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org