You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hao Ren <in...@gmail.com> on 2014/12/22 17:02:02 UTC

MLlib, classification label problem

Hi,

When going through the MLlib doc for classification:
http://spark.apache.org/docs/latest/mllib-linear-methods.html, I find that
the loss functions are based on label {1, -1}.

But in MLlib, the loss functions on label {1, 0} are used. And there is a
dataValidation check before fitting, if a label is other than 0 or 1, an
exception will be thrown.

I don't understand the intention here. Could someone explain this ?

Hao.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-classification-label-problem-tp20813.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: MLlib, classification label problem

Posted by Sean Owen <so...@cloudera.com>.

Yeah, it's mentioned in the doc:

"Note that, in the mathematical formulation in this guide, a training label
y is denoted as either +1 (positive) or −1 (negative), which is
convenient for the formulation. However, the negative label is
represented by 0 in MLlib instead of −1, to be consistent with
multiclass labeling."

Both are valid and equally correct, although the two conventions lead
to different expressions for the gradients and loss functions. I also
find it is a little confusing since the docs explain one form, and the
code implements another form (except some examples, which actually
reimplement with the -1/+1 convention).

I personally am also more used to the forms corresponding to 0 for the
negative class, but I'm sure some will say they're more accustomed to
the other convention.

On Mon, Dec 22, 2014 at 4:02 PM, Hao Ren <in...@gmail.com> wrote:
> Hi,
>
> When going through the MLlib doc for classification:
> http://spark.apache.org/docs/latest/mllib-linear-methods.html, I find that
> the loss functions are based on label {1, -1}.
>
> But in MLlib, the loss functions on label {1, 0} are used. And there is a
> dataValidation check before fitting, if a label is other than 0 or 1, an
> exception will be thrown.
>
> I don't understand the intention here. Could someone explain this ?
>
> Hao.
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-classification-label-problem-tp20813.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org