You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "DB Tsai (JIRA)" <ji...@apache.org> on 2015/05/23 00:12:17 UTC

[jira] [Commented] (SPARK-7674) R-like stats for ML models

    [ https://issues.apache.org/jira/browse/SPARK-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556913#comment-14556913 ] 

DB Tsai commented on SPARK-7674:
--------------------------------

I implemented the stats for ML models when I was Alpine, particularly, p-values, t-values, variance, null and residual Deviance, and QQ plot for elastic net models. Those stats are very useful for customers from statistical background. Keep me in the loop for the API design, since those stats are tricky to implement to match R. I reversed engineering R implementation at that time to get the same stats. Once APIs are finalized, I can quickly implement given my experience.

> R-like stats for ML models
> --------------------------
>
>                 Key: SPARK-7674
>                 URL: https://issues.apache.org/jira/browse/SPARK-7674
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Critical
>
> This is an umbrella JIRA for supporting ML model summaries and statistics, following the example of R's summary() and plot() functions.
> [Design doc|https://docs.google.com/document/d/1oswC_Neqlqn5ElPwodlDY4IkSaHAi0Bx6Guo_LvhHK8/edit?usp=sharing]
> From the design doc:
> {quote}
> R and its well-established packages provide extensive functionality for inspecting a model and its results.  This inspection is critical to interpreting, debugging and improving models.
> R is arguably a gold standard for a statistics/ML library, so this doc largely attempts to imitate it.  The challenge we face is supporting similar functionality, but on big (distributed) data.  Data size makes both efficient computation and meaningful displays/summaries difficult.
> R model and result summaries generally take 2 forms:
> * summary(model): Display text with information about the model and results on data
> * plot(model): Display plots about the model and results
> We aim to provide both of these types of information.  Visualization for the plottable results will not be supported in MLlib itself, but we can provide results in a form which can be plotted easily with other tools.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org