You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Jeremy (JIRA)" <ji...@apache.org> on 2016/09/24 14:56:20 UTC
[jira] [Updated] (SYSTEMML-700) Inflexible category labels for Multinomial Logistic Regression

     [ https://issues.apache.org/jira/browse/SYSTEMML-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy updated SYSTEMML-700:
----------------------------
    Description: 
The Logistic Regression algorithm requires that category labels be labeled as 0 up to the number of classes-1. It should be able to handle any set of category labels provided by the user. B_out should have the appropriate size regardless of the values of the labels given, and the algorithm should also preserve the original labeling for the user.

Added detail:

The solution I'm currently using is to transform the labels from whatever values they are to 0, 1, 2,... before hand, and then transform them back to their original labels after the algorithm runs.

Currently the algorithm doesn't handle class values that don't start at 0 or 1, and doesn't handle non-contiguous integers, both of which can come up. For example, the result for class labels 4,5,6 will return 5 sets of coefficients (correct number should be 2), and class labels -1, 0, 1 returns just one set of coefficients (correct number should be 2).

Handling frames with strings would be a really great user experience - that could look like R's coercion internally. Both glmnet and scikit-learn handle string label arguments, but both apis are weakly typed as well.

  was:The Logistic Regression algorithm requires that category labels be labeled as 0 up to the number of classes-1. It should be able to handle any set of category labels provided by the user. B_out should have the appropriate size regardless of the values of the labels given, and the algorithm should also preserve the original labeling for the user.


> Inflexible category labels for Multinomial Logistic Regression
> --------------------------------------------------------------
>
>                 Key: SYSTEMML-700
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-700
>             Project: SystemML
>          Issue Type: Bug
>          Components: Algorithms
>            Reporter: Jeremy
>            Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The Logistic Regression algorithm requires that category labels be labeled as 0 up to the number of classes-1. It should be able to handle any set of category labels provided by the user. B_out should have the appropriate size regardless of the values of the labels given, and the algorithm should also preserve the original labeling for the user.
> Added detail:
> The solution I'm currently using is to transform the labels from whatever values they are to 0, 1, 2,... before hand, and then transform them back to their original labels after the algorithm runs.
> Currently the algorithm doesn't handle class values that don't start at 0 or 1, and doesn't handle non-contiguous integers, both of which can come up. For example, the result for class labels 4,5,6 will return 5 sets of coefficients (correct number should be 2), and class labels -1, 0, 1 returns just one set of coefficients (correct number should be 2).
> Handling frames with strings would be a really great user experience - that could look like R's coercion internally. Both glmnet and scikit-learn handle string label arguments, but both apis are weakly typed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)