You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Seth Hendrickson (JIRA)" <ji...@apache.org> on 2016/07/01 14:12:11 UTC

[jira] [Commented] (SPARK-16235) "evaluateEachIteration" is returning wrong results when calculated for classification model.

    [ https://issues.apache.org/jira/browse/SPARK-16235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359014#comment-15359014 ] 

Seth Hendrickson commented on SPARK-16235:
------------------------------------------

To be clear, we are talking about MLlib here, since this method is private[spark] in the ML package? Is the only problem here that the scale of the MSE is "wrong" ? In other words, the metric can still be used but it will vary from [0, 4] instead of [0, 1] for each data point? 

If so, I would not classify this as a "major bug" but instead a minor annoyance. I suspect the original intent was to use this method with the same loss function that the algorithm was trained with. Since classification for GBT uses [-1, 1] labels, I don't think it's completely unreasonable to expect that the user is aware of this (we implicitly expect them to be aware of this when using the form of logloss that is used for GBTs). I guess I'm not sure that this is a big problem and, if it isn't, I would push back against making a big change to accommodate it. Thoughts?

> "evaluateEachIteration" is returning wrong results when calculated for classification model.
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16235
>                 URL: https://issues.apache.org/jira/browse/SPARK-16235
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1, 1.6.2, 2.0.0
>            Reporter: Mahmoud Rawas
>
> Basically within the mentioned function there is a code to map the actual value which supposed to be in the range of \[0,1] into the range of \[-1,1], in order to make it compatible with the predicted value produces by a classification mode. 
> {code}
> val remappedData = algo match {
>       case Classification => data.map(x => new LabeledPoint((x.label * 2) - 1, x.features))
>       case _ => data
>     }
> {code}
> the problem with this approach is the fact that it will calculate an incorrect error for an example mse will be be 4 time larger than the actual expected mse 
> Instead we should map the predicted value into probability value in [0,1].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org