You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yuhao yang (JIRA)" <ji...@apache.org> on 2017/06/30 23:23:00 UTC

[jira] [Commented] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion

    [ https://issues.apache.org/jira/browse/SPARK-19053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070849#comment-16070849 ] 

yuhao yang commented on SPARK-19053:
------------------------------------

Not sure if this is still wanted. cc [~josephkb]
And I'd like to understand if this jira is about performance improvement or API refine. Evaluator classes in ml basically invoke the mllib implementation and compute the metrics in one pass as I understand. 
Will this change the return type of the Evaluator.evaluate() method? Currently it's Double. 


> Supporting multiple evaluation metrics in DataFrame-based API: discussion
> -------------------------------------------------------------------------
>
>                 Key: SPARK-19053
>                 URL: https://issues.apache.org/jira/browse/SPARK-19053
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is to discuss supporting the computation of multiple evaluation metrics efficiently in the DataFrame-based API for MLlib.
> In the RDD-based API, RegressionMetrics and other *Metrics classes support efficient computation of multiple metrics.
> In the DataFrame-based API, there are a few options:
> * model/result summaries (e.g., LogisticRegressionSummary): These currently provide the desired functionality, but they require a model and do not let users compute metrics manually from DataFrames of predictions and true labels.
> * Evaluator classes (e.g., RegressionEvaluator): These only support computing a single metric in one pass over the data, but they do not require a model.
> * new class analogous to Metrics: We could introduce a class analogous to Metrics.  Model/result summaries could use this internally as a replacement for spark.mllib Metrics classes, or they could (maybe) inherit from these classes.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org