You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Cesar <ce...@gmail.com> on 2016/10/10 14:15:10 UTC

Logistic Regression Standardization in ML

I have a question regarding how the default standardization in the ML
version of the Logistic Regression (Spark 1.6) works.

Specifically about the next comments in the Spark Code:

/**
* Whether to standardize the training features before fitting the model.
* The coefficients of models will be always returned on the original scale,
* so it will be transparent for users. *Note that with/without
standardization,*
** the models should be always converged to the same solution when no
regularization*
** is applied.* In R's GLMNET package, the default behavior is true as well.
* Default is true.
*
* @group setParam
*/


Specifically I am having issues with understanding why the solution should
converge to the same weight values with/without standardization ?



Thanks !
-- 
Cesar Flores

Re: Logistic Regression Standardization in ML

Posted by Yanbo Liang <yb...@gmail.com>.

AFAIK, we can guarantee with/without standardization, the models always
converged to the same solution if there is no regularization. You can refer
the test casts at:

https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala#L551


https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala#L588

Thanks
Yanbo

On Mon, Oct 10, 2016 at 7:27 AM, Sean Owen <so...@cloudera.com> wrote:

> (BTW I think it means "when no standardization is applied", which is how
> you interpreted it, yes.) I think it just means that if feature i is
> divided by s_i, then its coefficients in the resulting model will end up
> larger by a factor of s_i. They have to be divided by s_i to put them back
> on the same scale as the unnormalized inputs. I don't think that in general
> it will result in exactly the same model, because part of the point of
> standardizing is to improve convergence. You could propose a rewording of
> the two occurrences of this paragraph if you like.
>
> On Mon, Oct 10, 2016 at 3:15 PM Cesar <ce...@gmail.com> wrote:
>
>>
>> I have a question regarding how the default standardization in the ML
>> version of the Logistic Regression (Spark 1.6) works.
>>
>> Specifically about the next comments in the Spark Code:
>>
>> /**
>> * Whether to standardize the training features before fitting the model.
>> * The coefficients of models will be always returned on the original
>> scale,
>> * so it will be transparent for users. *Note that with/without
>> standardization,*
>> ** the models should be always converged to the same solution when no
>> regularization*
>> ** is applied.* In R's GLMNET package, the default behavior is true as
>> well.
>> * Default is true.
>> *
>> * @group setParam
>> */
>>
>>
>> Specifically I am having issues with understanding why the solution
>> should converge to the same weight values with/without standardization ?
>>
>>
>>
>> Thanks !
>> --
>> Cesar Flores
>>
>

Re: Logistic Regression Standardization in ML

Posted by Sean Owen <so...@cloudera.com>.

(BTW I think it means "when no standardization is applied", which is how
you interpreted it, yes.) I think it just means that if feature i is
divided by s_i, then its coefficients in the resulting model will end up
larger by a factor of s_i. They have to be divided by s_i to put them back
on the same scale as the unnormalized inputs. I don't think that in general
it will result in exactly the same model, because part of the point of
standardizing is to improve convergence. You could propose a rewording of
the two occurrences of this paragraph if you like.

On Mon, Oct 10, 2016 at 3:15 PM Cesar <ce...@gmail.com> wrote:

>
> I have a question regarding how the default standardization in the ML
> version of the Logistic Regression (Spark 1.6) works.
>
> Specifically about the next comments in the Spark Code:
>
> /**
> * Whether to standardize the training features before fitting the model.
> * The coefficients of models will be always returned on the original scale,
> * so it will be transparent for users. *Note that with/without
> standardization,*
> ** the models should be always converged to the same solution when no
> regularization*
> ** is applied.* In R's GLMNET package, the default behavior is true as
> well.
> * Default is true.
> *
> * @group setParam
> */
>
>
> Specifically I am having issues with understanding why the solution should
> converge to the same weight values with/without standardization ?
>
>
>
> Thanks !
> --
> Cesar Flores
>