You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/04/12 07:13:15 UTC

[jira] [Assigned] (SPARK-1303) Added discretization capability to MLlib.

     [ https://issues.apache.org/jira/browse/SPARK-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-1303:
-----------------------------------

    Assignee: Apache Spark

> Added discretization capability to MLlib.
> -----------------------------------------
>
>                 Key: SPARK-1303
>                 URL: https://issues.apache.org/jira/browse/SPARK-1303
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: LIDIAgroup
>            Assignee: Apache Spark
>
> Some time ago, we have commented with Ameet Talwalkar the possibilty of including both Feature Selection and Discretization algorithms to MLlib.
> In this patch we've implemented Entropy Minimization Discretization following the algorithm described in the paper "Multi-interval discretization of continuous-valued attributes for classification learning" by Fayyad and Irani (1993). This is one of the most used Discretizers and is already included in most libraries like Weka, etc. This can be used as base for FS algorims and the NaiveBayes already included in MLlib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org