You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2015/07/15 00:59:05 UTC

[jira] [Commented] (SPARK-7674) R-like stats for ML models

    [ https://issues.apache.org/jira/browse/SPARK-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627236#comment-14627236 ] 

Joseph K. Bradley commented on SPARK-7674:
------------------------------------------

The initial PR for linear regression has been merged.  So now, if anyone wants to work on adding stats for other models, please do!  We can follow the example set by linear regression.  If you need to create a new JIRA, please link it here (or comment with the link) to coordinate.  Thanks!

> R-like stats for ML models
> --------------------------
>
>                 Key: SPARK-7674
>                 URL: https://issues.apache.org/jira/browse/SPARK-7674
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Critical
>
> This is an umbrella JIRA for supporting ML model summaries and statistics, following the example of R's summary() and plot() functions.
> [Design doc|https://docs.google.com/document/d/1oswC_Neqlqn5ElPwodlDY4IkSaHAi0Bx6Guo_LvhHK8/edit?usp=sharing]
> From the design doc:
> {quote}
> R and its well-established packages provide extensive functionality for inspecting a model and its results.  This inspection is critical to interpreting, debugging and improving models.
> R is arguably a gold standard for a statistics/ML library, so this doc largely attempts to imitate it.  The challenge we face is supporting similar functionality, but on big (distributed) data.  Data size makes both efficient computation and meaningful displays/summaries difficult.
> R model and result summaries generally take 2 forms:
> * summary(model): Display text with information about the model and results on data
> * plot(model): Display plots about the model and results
> We aim to provide both of these types of information.  Visualization for the plottable results will not be supported in MLlib itself, but we can provide results in a form which can be plotted easily with other tools.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org