You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanbo Liang (JIRA)" <ji...@apache.org> on 2015/10/30 18:18:28 UTC
[jira] [Comment Edited] (SPARK-9836) Provide R-like summary
statistics for ordinary least squares via normal equation solver
[ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982905#comment-14982905 ]
Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:18 PM:
--------------------------------------------------------------
[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|) " are statistics for OLS/WLS, I will add these statistics in this task.
As to the remaining part
{quote}
Null deviance: 102.168 on 149 degrees of freedom
Residual deviance: 28.004 on 146 degrees of freedom
AIC: 183.94
Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I have found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :)
was (Author: yanboliang):
[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|) " are statistics for OLS/WLS, I will add these statistics in this task.
As to the following part
{quote}
Null deviance: 102.168 on 149 degrees of freedom
Residual deviance: 28.004 on 146 degrees of freedom
AIC: 183.94
Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I have found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :)
> Provide R-like summary statistics for ordinary least squares via normal equation solver
> ---------------------------------------------------------------------------------------
>
> Key: SPARK-9836
> URL: https://issues.apache.org/jira/browse/SPARK-9836
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Reporter: Xiangrui Meng
> Assignee: Yanbo Liang
>
> In R, model fitting comes with summary statistics. We can provide most of those via normal equation solver (SPARK-9834). If some statistics requires additional passes to the dataset, we can expose an option to let users select desired statistics before model fitting.
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -1.30711 -0.25713 -0.05325 0.19542 1.41253
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.2514 0.3698 6.089 9.57e-09 ***
> Sepal.Width 0.8036 0.1063 7.557 4.19e-12 ***
> Speciesversicolor 1.4587 0.1121 13.012 < 2e-16 ***
> Speciesvirginica 1.9468 0.1000 19.465 < 2e-16 ***
> ---
> Signif. codes:
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
> Null deviance: 102.168 on 149 degrees of freedom
> Residual deviance: 28.004 on 146 degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org