You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2015/04/13 07:42:12 UTC

[jira] [Resolved] (SPARK-5886) Add LabelIndexer

     [ https://issues.apache.org/jira/browse/SPARK-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng resolved SPARK-5886.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.0

Issue resolved by pull request 4735
[https://github.com/apache/spark/pull/4735]

> Add LabelIndexer
> ----------------
>
>                 Key: SPARK-5886
>                 URL: https://issues.apache.org/jira/browse/SPARK-5886
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>             Fix For: 1.4.0
>
>
> `LabelIndexer` takes a column of labels (raw categories) and outputs an integer column with labels indexed by their frequency.
> {code}
> va li = new LabelIndexer()
>   .setInputCol("country")
>   .setOutputCol("countryIndex")
> {code}
> In the output column, we should store the label to index map as an ML attribute. The index should be ordered by frequency, where the most frequent label gets index 0, to enhance sparsity.
> We can discuss whether this should index multiple columns at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org