You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/04/04 01:39:52 UTC

[jira] [Commented] (SPARK-6705) MLLIB ML Pipeline's Logistic Regression has no intercept term

    [ https://issues.apache.org/jira/browse/SPARK-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395274#comment-14395274 ] 

Apache Spark commented on SPARK-6705:
-------------------------------------

User 'oefirouz' has created a pull request for this issue:
https://github.com/apache/spark/pull/5301

> MLLIB ML Pipeline's Logistic Regression has no intercept term
> -------------------------------------------------------------
>
>                 Key: SPARK-6705
>                 URL: https://issues.apache.org/jira/browse/SPARK-6705
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Omede Firouz
>
> Currently, the ML Pipeline's LogisticRegression.scala file does not allow setting whether or not to fit an intercept term. Therefore, the pipeline defers to LogisticRegressionWithLBFGS which does not use an intercept term. This makes sense from a performance point of view because adding an intercept term requires memory allocation.
> However, this is undesirable statistically, since the statistical default is usually to include an intercept term, and one needs to have a very strong
> reason for not having an intercept term.
> Explicitly modeling the intercept by adding a column of all 1s does not
> work because LogisticRegressionWithLBFGS forces column normalization, and a column of all 1s has 0 variance and so dividing by 0 kills it.
> We should open up the API for the ML Pipeline to explicitly allow controlling whether or not to fit an intercept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org