You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2017/01/04 08:18:58 UTC

[jira] [Commented] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion

    [ https://issues.apache.org/jira/browse/SPARK-19053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15797559#comment-15797559 ] 

zhengruifeng commented on SPARK-19053:
--------------------------------------

I perfer Evaluator to Summary, in many cases we do not have a model.
And I think it maybe doable to support computing multi metric in one pass if we refactor evaluators:
{code}
val evaluator = new RegressionEvaluator().addMetric("r2").addMetric("rmse")
val result = evaluator.evaluate(dataset)
result.getMetric("r2")
result.getMetric("rmse")

val evaluator2 = new RegressionEvaluator().setMetrics(Seq("r2", "rmse"))
val result2 = evaluator2.evaluate(dataset)
result2.getMetric("r2")
result2.getMetric("rmse")
{code}

> Supporting multiple evaluation metrics in DataFrame-based API: discussion
> -------------------------------------------------------------------------
>
>                 Key: SPARK-19053
>                 URL: https://issues.apache.org/jira/browse/SPARK-19053
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is to discuss supporting the computation of multiple evaluation metrics efficiently in the DataFrame-based API for MLlib.
> In the RDD-based API, RegressionMetrics and other *Metrics classes support efficient computation of multiple metrics.
> In the DataFrame-based API, there are a few options:
> * model/result summaries (e.g., LogisticRegressionSummary): These currently provide the desired functionality, but they require a model and do not let users compute metrics manually from DataFrames of predictions and true labels.
> * Evaluator classes (e.g., RegressionEvaluator): These only support computing a single metric in one pass over the data, but they do not require a model.
> * new class analogous to Metrics: We could introduce a class analogous to Metrics.  Model/result summaries could use this internally as a replacement for spark.mllib Metrics classes, or they could (maybe) inherit from these classes.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org