You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/07 12:16:00 UTC
[jira] [Updated] (SPARK-27645) Cache result of count function to
that RDD
[ https://issues.apache.org/jira/browse/SPARK-27645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-27645:
---------------------------------
Target Version/s: (was: 2.4.3)
> Cache result of count function to that RDD
> ------------------------------------------
>
> Key: SPARK-27645
> URL: https://issues.apache.org/jira/browse/SPARK-27645
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 2.4.3
> Reporter: Seungmin Lee
> Priority: Major
>
> I'm not sure whether there have been an update for this(as far as I know, there isn't such feature), since RDD is immutable, why don't we keep the result from count function of that RDD and reuse it in future calls?
> Sometimes, we only have RDD variable but don't have previously run result from count.
> In this case, not running whole count action to entire dataset would be very beneficial in terms of performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org