You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/07/23 04:28:02 UTC

[jira] [Commented] (SPARK-21510) Add isMaterialized() and eager persist() to Dataset APIs

    [ https://issues.apache.org/jira/browse/SPARK-21510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097498#comment-16097498 ] 

Apache Spark commented on SPARK-21510:
--------------------------------------

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/18717

> Add isMaterialized() and eager persist() to Dataset APIs
> --------------------------------------------------------
>
>                 Key: SPARK-21510
>                 URL: https://issues.apache.org/jira/browse/SPARK-21510
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Xiao Li
>            Assignee: Xiao Li
>
> Currently, when using Spark, the beginners do not realize our persist API is lazy. They do not know what is the most efficient way to materialize it. Sometimes, they just use collect(), which is very expensive when the data set is big. 
> In addition, we also need another API to verify whether the Dataset has been cached and materialized. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org