You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Younus (JIRA)" <ji...@apache.org> on 2016/03/09 20:08:40 UTC

[jira] [Commented] (SPARK-13777) Weighted Leaset Squares fails when there are features with identical values.

    [ https://issues.apache.org/jira/browse/SPARK-13777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187695#comment-15187695 ] 

Imran Younus commented on SPARK-13777:
--------------------------------------

My solution to this problem is to remove columns and rows from (A^T A) matrix corresponding to constant features. I've tested this method and it works. I'll submit a pull request for this along with tests soon.

> Weighted Leaset Squares fails when there are features with identical values.
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-13777
>                 URL: https://issues.apache.org/jira/browse/SPARK-13777
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>            Reporter: Imran Younus
>            Priority: Minor
>
> "normal" solver in LinearRegression uses Cholesky decomposition to calculate the coefficients. If the data has features with identical values (zero variance), then (A^T A) matrix is not positive definite any more and the Cholesky decomposition fails.
> For the same case, "l-bfgs" solver sets the coefficients of these constant features to zero and produces valid coefficients for the rest of the features. This behaviour is consistent with glmnet in R. "normal" solver should also do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org