You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanbo Liang (JIRA)" <ji...@apache.org> on 2016/06/15 00:33:30 UTC

[jira] [Updated] (SPARK-15957) RFormula supports forcing to index label

     [ https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yanbo Liang updated SPARK-15957:
--------------------------------
    Description: 
RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful, so we can force to index label whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153.
For regression, we will still to keep numeric type.
We should add a param to control whether to force to index label for RFormula.

  was:Add param to make users can force to index label whether it is numeric or string. For classification algorithms, we force to index label by setting it with true.


> RFormula supports forcing to index label
> ----------------------------------------
>
>                 Key: SPARK-15957
>                 URL: https://issues.apache.org/jira/browse/SPARK-15957
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yanbo Liang
>            Assignee: Yanbo Liang
>
> RFormula will index label only when it is string type. If the label is numeric type and we use RFormula to present a classification model, we can not extract label attributes from the label column metadata successfully. The label attributes are useful, so we can force to index label whether it is numeric or string type for classification. Then SparkR wrappers can extract label attributes from the column metadata successfully. This feature can help us to fix bug similar with SPARK-15153.
> For regression, we will still to keep numeric type.
> We should add a param to control whether to force to index label for RFormula.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org