You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Seth Hendrickson (JIRA)" <ji...@apache.org> on 2016/08/23 22:24:20 UTC
[jira] [Commented] (SPARK-17201) Investigate numerical instability for MLOR without regularization

    [ https://issues.apache.org/jira/browse/SPARK-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433754#comment-15433754 ] 

Seth Hendrickson commented on SPARK-17201:
------------------------------------------

Restating some of what was said on github:

_Concern is that for softmax regression without regularization, the Hessian becomes singular and Newton methods can run into problems. Excerpt from this [link|http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression]: "Thus, the minimizer of J(θ) is not unique. (Interestingly, J(θ) is still convex, and thus gradient descent will not run into a local optima problems. But the Hessian is singular/non-invertible, which causes a straightforward implementation of Newton's method to run into numerical problems.)"_

I looked into this. It is true that for softmax regression the Hessian is Symmetric positive _semidefinite_, not symmetric positive definite. There is a good-enough proof of such [here|http://qwone.com/~jason/writing/convexLR.pdf]. Still consider the quote from the resources mentioned above "... which causes a *straightforward* implementation of Newton's method to run into numerical problems." It's true the lack of positive definiteness can be a problem for *naive* Newton methods, but LBFGS is not a straightforward implementation - it does not use the Hessian directly, but it uses an approximation to the Hessian. In fact, there are an abundance of resources showing that as long as the initial Hessian approximation is symmetric positive definite, then the subsequent recursive updates are also symmetric positive definite. From one resource: 

"H(-1)_(n + 1) is positive definite (psd) when H^(-1)_n is. Assuming our initial guess of H0 is psd, it follows by induction each inverse Hessian estimate is as well. Since we can choose any H^(-1)_0 we want, including the identity matrix, this is easy to ensure."

I appreciate other opinions on this to make sure I am understanding things correctly. Seems like LBFGS will work fine even without regularization. cc [~dbtsai] [~WeichenXu123]

> Investigate numerical instability for MLOR without regularization
> -----------------------------------------------------------------
>
>                 Key: SPARK-17201
>                 URL: https://issues.apache.org/jira/browse/SPARK-17201
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Seth Hendrickson
>
> As mentioned [here|http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression], when no regularization is applied in Softmax regression, second order Newton solvers may run into numerical instability problems. We should investigate this in practice and find a solution, possibly by implementing pivoting when no regularization is applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org