You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "DB Tsai (JIRA)" <ji...@apache.org> on 2014/08/12 03:08:14 UTC

[jira] [Created] (SPARK-2979) Improve the convergence rate by minimize the condition number in LOR with LBFGS

DB Tsai created SPARK-2979:
------------------------------

             Summary: Improve the convergence rate by minimize the condition number in LOR with LBFGS
                 Key: SPARK-2979
                 URL: https://issues.apache.org/jira/browse/SPARK-2979
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: DB Tsai


Scaling to minimize the condition number:
    
During the optimization process, the convergence (rate) depends on the condition number of the training dataset. Scaling the variables often reduces this condition number, thus mproving the convergence rate dramatically. Without reducing the condition number, some training datasets mixing the columns with different scales may not be able to converge.
     
GLMNET and LIBSVM packages perform the scaling to reduce the condition number, and return the weights in the original scale.

See page 9 in http://cran.r-project.org/web/packages/glmnet/glmnet.pdf
     
Here, if useFeatureScaling is enabled, we will standardize the training features by dividing the variance of each column (without subtracting the mean), and train the model in the scaled space. Then we transform the coefficients from the scaled space to the original scale as GLMNET and LIBSVM do.
   
Currently, it's only enabled in LogisticRegressionWithLBFGS




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org