You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean R. Owen (Jira)" <ji...@apache.org> on 2019/11/10 19:21:00 UTC
[jira] [Resolved] (SPARK-29812) Missing persist on
predictionAndLabels in MulticlassClassificationEvaluator
[ https://issues.apache.org/jira/browse/SPARK-29812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-29812.
----------------------------------
Resolution: Duplicate
> Missing persist on predictionAndLabels in MulticlassClassificationEvaluator
> ---------------------------------------------------------------------------
>
> Key: SPARK-29812
> URL: https://issues.apache.org/jira/browse/SPARK-29812
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Affects Versions: 2.4.3
> Reporter: Dong Wang
> Priority: Major
>
> The rdd predictionAndLabels in ml.evaluation.MulticlassificationEvaluator.evaluate() needs to be persisted. When MulticlassMetrics uses predictionAndLabels to initialize fileds, there will be at least five actions executed on predictionAndLabels.
> {code:scala}
> override def evaluate(dataset: Dataset[_]): Double = {
> val schema = dataset.schema
> SchemaUtils.checkColumnType(schema, $(predictionCol), DoubleType)
> SchemaUtils.checkNumericType(schema, $(labelCol))
> // Needs to be persisted
> val predictionAndLabels =
> dataset.select(col($(predictionCol)), col($(labelCol)).cast(DoubleType)).rdd.map {
> case Row(prediction: Double, label: Double) => (prediction, label)
> }
> // The initialization will use predictionAndLabels multi times in different actions.
> val metrics = new MulticlassMetrics(predictionAndLabels)
> val metric = $(metricName) match {
> case "f1" => metrics.weightedFMeasure
> case "weightedPrecision" => metrics.weightedPrecision
> case "weightedRecall" => metrics.weightedRecall
> case "accuracy" => metrics.accuracy
> }
> metric
> }
> {code}
> This issue is reported by our tool CacheCheck, which is used to dynamically detecting persist()/unpersist() api misuses.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org