You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "CacheCheck (Jira)" <ji...@apache.org> on 2020/03/22 16:28:00 UTC

[jira] [Commented] (SPARK-31217) Unnecessary persist on cumulativeCounts in BinaryClassificationMetrics

    [ https://issues.apache.org/jira/browse/SPARK-31217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064333#comment-17064333 ] 

CacheCheck commented on SPARK-31217:
------------------------------------

Besides, I think we also should add persist() APIs in other metrics class. E.g., _summary_ in RegressionMetrics.
In other three metrics classes, i.e., MulticlassMetics, MultilabelMetrics, RankingMetrics, _predictionAndLabels_ is important and is used by multiple actions in object initialization, it's better to check if it is cached before. If not, we should cache it in these classes.

> Unnecessary persist on cumulativeCounts in BinaryClassificationMetrics
> ----------------------------------------------------------------------
>
>                 Key: SPARK-31217
>                 URL: https://issues.apache.org/jira/browse/SPARK-31217
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.4.4, 2.4.5
>            Reporter: CacheCheck
>            Priority: Major
>
> In mllib.evaluation.BinaryClassificationMetrics, _cumulativeCounts_ is cached in a lazy initialization. But when I run LogisticRegressionSummaryExample as well as ModelSelectionViaCrossValidationExample, I find that cached _cumulativeCounts_ only used by one action during execution. 
> So I think it should not be cached in initilization, we can set an extra persist() API in this class, just as that the unpersist() API in BinaryClassificationMetrics releases cached _cumulativeCounts_. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org