You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/02/09 01:37:00 UTC

[jira] [Assigned] (SPARK-26852) CrossValidator: support transforming metrics to absolute values prior to min/max test

     [ https://issues.apache.org/jira/browse/SPARK-26852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-26852:
------------------------------------

    Assignee:     (was: Apache Spark)

> CrossValidator: support transforming metrics to absolute values prior to min/max test
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-26852
>                 URL: https://issues.apache.org/jira/browse/SPARK-26852
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: Ben Weber
>            Priority: Minor
>              Labels: starter
>
> When writing a custom Evaluator with PySpark, it's often useful to be able to support negative values in the evaluate function. For example, the relative difference between predicted and actual values. In this case, the goal is to select the value closest to 0 rather than the smaller or largest value. We should add a flag that enables users to specify this scenario.
> For example, CrossValidator may be used with a parameter grid that results in the following metric values for different folds:
>  * [ 0.5, 0.5, 0.5, 0, 0 ]
>  * [ 0.5, -0.5, 0.5, 0, 0  ] 
>  * [ -0.5, -0.5, -0.5, 0, 0 ]
> This results in the following values for avgMetrics: [ 1.5, 0.5, -1.5 ]. There is currently no way to tell the cross validator to select the second model, with the avg metrics closest to zero. 
> Here's an example Evaluator where this functionality is useful:
> {code:java}
> from pyspark.ml.evaluation import Evaluator
> from pyspark.sql import functions as F
> class SumEvaluator(Evaluator):
>     def __init__(self, predictionCol="prediction", labelCol="label"):
>         self.predictionCol = predictionCol
>         self.labelCol = labelCol
>     def _evaluate(self, dataset):
>         actual = dataset.select(F.sum(self.labelCol)).collect()[0][0]
>         prediction = dataset.select(F.sum(self.predictionCol)).collect()[0][0] 
>         return ((prediction - actual)/actual) 
>     def isLargerBetter(self):
>         return False 
>     def applyAbsoluteTransform(self):
>         return True
> {code}
> This is a custom evaluator that compares the different between the total and predicted values in a regression problem. I am proposing a new function for the Evaluator, that specifies if an absolute transformation should be applied to the cross validated metrics. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org