You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2019/03/28 16:14:00 UTC

[jira] [Updated] (SPARK-26771) Make .unpersist(), .destroy() consistently non-blocking by default

     [ https://issues.apache.org/jira/browse/SPARK-26771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-26771:
------------------------------
    Docs Text: The RDD and DataFrame .unpersist() method, and Broadcast .destroy() method, take an optional 'blocking' argument. The default was 'false' in all cases except for (Scala) RDDs and their GraphX subclasses. The default is now 'false' (non-blocking) in all of these methods. Pyspark's RDD and Broadcast classes now have an optional 'blocking' argument as well, with the same behavior. Finally, internally, cached queries are also unpersisted without blocking now.  (was: The RDD and DataFrame .unpersist() method, and Broadcast .destroy() method, take an optional 'blocking' argument. The default was 'false' in all cases except for (Scala) RDDs and their GraphX subclasses. The default is now 'false' (non-blocking) in all of these methods. Pyspark's RDD and Broadcast classes now have an optional 'blocking' argument as well, with the same behavior.)

> Make .unpersist(), .destroy() consistently non-blocking by default
> ------------------------------------------------------------------
>
>                 Key: SPARK-26771
>                 URL: https://issues.apache.org/jira/browse/SPARK-26771
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX, Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Major
>              Labels: release-notes
>             Fix For: 3.0.0
>
>
> See https://issues.apache.org/jira/browse/SPARK-26728 and https://github.com/apache/spark/pull/23650 . 
> RDD and DataFrame expose an .unpersist() method with optional "blocking" argument. So does Broadcast.destroy(). This argument is false by default except for the Scala RDD (not Pyspark) implementation and its GraphX subclasses. Most usages of these methods request non-blocking behavior already, and indeed, it's not typical to want to wait for the resources to be freed, except in tests asserting behavior about these methods (where blocking is typically requested).
> This proposes to make the default false across these methods, and adjust callers to only request non-default blocking behavior where important, such as in a few key tests. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org