You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/11/27 15:13:00 UTC

[jira] [Commented] (SPARK-22616) df.cache() / df.persist() should have an option blocking like df.unpersist()

    [ https://issues.apache.org/jira/browse/SPARK-22616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266920#comment-16266920 ] 

Sean Owen commented on SPARK-22616:
-----------------------------------

I do see this come up so much that I think it's a valid feature request. count()-ing just to persist the data is a bit wasteful. You can more efficiently do so with mapPartitions that does nothing, but, it's extra complexity.

I think it unfortunately requires a breaking API change though. It could wait for Spark 3, or come as a new "materialize()" method or something.

> df.cache() / df.persist() should have an option blocking like df.unpersist()
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-22616
>                 URL: https://issues.apache.org/jira/browse/SPARK-22616
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Andreas Maier
>            Priority: Minor
>
> The method dataframe.unpersist() has an option blocking, which allows for eager unpersisting of a dataframe. On the other side the method dataframe.cache() and dataframe.persist() don't have a comparable option. A (undocumented) workaround for this is to call dataframe.count() directly after cache() or persist(). But for API consistency and convenience it would make sense to give cache() and persist() also the option blocking. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org