You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanbo Liang (JIRA)" <ji...@apache.org> on 2015/10/30 18:18:28 UTC

[jira] [Comment Edited] (SPARK-9836) Provide R-like summary statistics for ordinary least squares via normal equation solver

    [ https://issues.apache.org/jira/browse/SPARK-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982905#comment-14982905 ] 

Yanbo Liang edited comment on SPARK-9836 at 10/30/15 5:18 PM:
--------------------------------------------------------------

[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|)  " are statistics for OLS/WLS, I will add these statistics in this task.
As to the remaining part
{quote}
    Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I have found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :)


was (Author: yanboliang):
[~mengxr] After survey I found that "Deviance Residuals" and "Coefficients: Estimate Std. Error t value Pr(>|t|)  " are statistics for OLS/WLS, I will add these statistics in this task.
As to the following part
{quote}
    Null deviance: 102.168  on 149  degrees of freedom
Residual deviance:  28.004  on 146  degrees of freedom
AIC: 183.94

Number of Fisher Scoring iterations: 2
{quote}
Some of the statistics variables depends upon IRLS(SPARK-9835). I have found you have open SPARK-9837 to track summary statistics for GLMs via IRLS, so these statistics will be work of SPARK-9837. Please correct me if have misunderstand. :)

> Provide R-like summary statistics for ordinary least squares via normal equation solver
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-9836
>                 URL: https://issues.apache.org/jira/browse/SPARK-9836
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Yanbo Liang
>
> In R, model fitting comes with summary statistics. We can provide most of those via normal equation solver (SPARK-9834). If some statistics requires additional passes to the dataset, we can expose an option to let users select desired statistics before model fitting. 
> {code}
> > summary(model)
> Call:
> glm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)
> Deviance Residuals: 
>      Min        1Q    Median        3Q       Max  
> -1.30711  -0.25713  -0.05325   0.19542   1.41253  
> Coefficients:
>                   Estimate Std. Error t value Pr(>|t|)    
> (Intercept)         2.2514     0.3698   6.089 9.57e-09 ***
> Sepal.Width         0.8036     0.1063   7.557 4.19e-12 ***
> Speciesversicolor   1.4587     0.1121  13.012  < 2e-16 ***
> Speciesvirginica    1.9468     0.1000  19.465  < 2e-16 ***
> ---
> Signif. codes:  
> 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> (Dispersion parameter for gaussian family taken to be 0.1918059)
>     Null deviance: 102.168  on 149  degrees of freedom
> Residual deviance:  28.004  on 146  degrees of freedom
> AIC: 183.94
> Number of Fisher Scoring iterations: 2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org