You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/06/06 14:37:02 UTC

[GitHub] spark pull request #21502: [SPARK-22575][SQL] Add destroy to Dataset

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/21502

    [SPARK-22575][SQL] Add destroy to Dataset

    ## What changes were proposed in this pull request?
    
    In the Dataset API we may acquire resources which we cannot deallocate. This happens for broadcast joins. The broadcasted object is never destroyed and we rely on the garbage collection of broadcasted object to free it. In a general use case, this is a safe assumption, but when dynamic allocation is enabled, the current approach can lead to resource leakage.
    
    In particular, when a Spark application is submitted on YARN with dynamic allocation enabled, we may leak disk space. Indeed, in such a scenario, when query with a broadcast join is executed, it is likely that we ask for new containers. These containers are used for the execution of the query and then killed. They may be killed before the broadcast object is GCed. In this case, the files which have been written are never removed (as the container is not alive anymore to remove them and YARN removes them only when the application ends).
    
    In order to solve the above-mentioned issue, the PR proposes to add a `destroy` method to the `Dataset` class, which can be used to free all the resources which have been acquired in the plan execution. Eagerly destroying the acquired resources, they are freed before the containers are killed, avoiding (or at least reducing considerably) the problem.
    
    ## How was this patch tested?
    
    added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-22575

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21502
    
----
commit 147bd08db09fe328de12069c9c0d8a849d99adf4
Author: Marco Gaido <ma...@...>
Date:   2018-01-31T16:35:37Z

    [SPARK-22575][SQL] Add destroy to Dataset

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3824/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3832/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    IMO I like that approach. If this issue happens only in dynamic allocation, how about adding a new option to turn off/on that checking?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91504/testReport)** for PR 21502 at commit [`147bd08`](https://github.com/apache/spark/commit/147bd08db09fe328de12069c9c0d8a849d99adf4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91524/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Well, it happens especially with dynamic allocation, but there may be other causes like YARN preemption. Anytime a container is killed we can face this issue. Anyway, I plan to check the feasibility of this other approach (it may take some time as I'm not very familiar with that part of the codebase).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91528/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91528/testReport)** for PR 21502 at commit [`4d080cf`](https://github.com/apache/spark/commit/4d080cff795457c6a02b255acc691157afc94e81).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    @maropu I think a monitor thread would be useless. Once the container is gone, there is nothing we can do. Another solution which may be worth to investigate is to clear the block manager for an executor before killing it. But I am not sure about this, as it introduces an overhead during the scale-down of containers.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91504/testReport)** for PR 21502 at commit [`147bd08`](https://github.com/apache/spark/commit/147bd08db09fe328de12069c9c0d8a849d99adf4).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 closed the pull request at:

    https://github.com/apache/spark/pull/21502


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91524/testReport)** for PR 21502 at commit [`789168e`](https://github.com/apache/spark/commit/789168e147615c50cfd67ba959ba1d43afb00ccf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3823/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    How does this solve the problem you described? If the container is gone, the process is gone and users can't destroy things anymore.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91528/testReport)** for PR 21502 at commit [`4d080cf`](https://github.com/apache/spark/commit/4d080cff795457c6a02b255acc691157afc94e81).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    aha, thanks for the kindly explanation!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91505/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21502#discussion_r193976536
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ---
    @@ -153,6 +154,23 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils {
       }
     
       test("SPARK-22575: remove allocated blocks when they are not needed anymore") {
    +    val blockManager = sparkContext.env.blockManager
    +    def broadcastedBlockIds: Seq[BlockId] = {
    +      blockManager.getMatchingBlockIds(blockId => {
    +        blockId.isBroadcast && blockManager.getStatus(blockId).get.storageLevel.deserialized
    +      }).distinct
    +    }
    +    def isHashedRelationPresent(blockIds: Seq[BlockId]): Boolean = {
    +      val blockValues = blockIds.flatMap { id =>
    +        val block = blockManager.getSingle[Any](id)
    +        if (block.isDefined) {
    --- End diff --
    
    Maybe we should add more comments here, is this the root cause about last failure?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21502#discussion_r194101781
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ---
    @@ -153,6 +154,23 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils {
       }
     
       test("SPARK-22575: remove allocated blocks when they are not needed anymore") {
    +    val blockManager = sparkContext.env.blockManager
    +    def broadcastedBlockIds: Seq[BlockId] = {
    +      blockManager.getMatchingBlockIds(blockId => {
    +        blockId.isBroadcast && blockManager.getStatus(blockId).get.storageLevel.deserialized
    +      }).distinct
    +    }
    +    def isHashedRelationPresent(blockIds: Seq[BlockId]): Boolean = {
    +      val blockValues = blockIds.flatMap { id =>
    +        val block = blockManager.getSingle[Any](id)
    +        if (block.isDefined) {
    --- End diff --
    
    no, the reason of the last failure was that the tests which run before this test case were not cleaning up their broadcasted variables. So the UT was failing because there were broadcasted items present in the block manager, but not because of this test, because of the leftovers of previous tests. The reason why the test was passing for you adding a sleep was that in that time, the previous broadcasted items went cleaned up by the apposite thread.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by tooptoop4 <gi...@git.apache.org>.
Github user tooptoop4 commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    can this be merged?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    @rxin this can reduce the problem since the destroy operation is invoked when the execution of the query is terminated (in STS), instead of some time later when the cleaner thread does it. In this way, when dynamic allocation is enabled, it is very likely that during the delay introduced by waiting for the cleaner thread to remove the blocks a lot of containers are killed, while if we destroy it immediately after the query terminates, the containers are very likely to not have been killed yet, so they can clean up the blocks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91505/testReport)** for PR 21502 at commit [`ec365d6`](https://github.com/apache/spark/commit/ec365d628126255cea3eadd4a8357030e5bf1f2e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Is it the best to add a new api for that? Is that a bad idea to invoke a monitor thread to check that kind of things?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21502#discussion_r193724774
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ---
    @@ -152,6 +152,26 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils {
         }
       }
     
    +  test("SPARK-22575: remove allocated blocks when they are not needed anymore") {
    +    val df1 = Seq((1, "4"), (2, "2")).toDF("key", "value")
    +    val df2 = Seq((1, "1"), (2, "2")).toDF("key", "value")
    +    val df3 = df1.join(broadcast(df2), Seq("key"), "inner")
    +    val numBroadCastHashJoin = df3.queryExecution.executedPlan.collect {
    +      case b: BroadcastHashJoinExec => b
    +    }.size
    +    assert(numBroadCastHashJoin > 0)
    +    df3.collect()
    +    df3.destroy()
    +    val blockManager = sparkContext.env.blockManager
    +    val blocks = blockManager.getMatchingBlockIds(blockId => {
    +      blockId.isBroadcast && blockManager.getStatus(blockId).get.storageLevel.deserialized
    +    }).distinct
    +    val blockValues = blocks.flatMap { id =>
    +      blockManager.getSingle[Any](id)
    +    }
    --- End diff --
    
    Here maybe the root cause for the unstable UT failure and the block can't be deleted soon, I added a sleep and can pass every times, you can have a try.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3833/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91505/testReport)** for PR 21502 at commit [`ec365d6`](https://github.com/apache/spark/commit/ec365d628126255cea3eadd4a8357030e5bf1f2e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21502#discussion_r193742604
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala ---
    @@ -152,6 +152,26 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils {
         }
       }
     
    +  test("SPARK-22575: remove allocated blocks when they are not needed anymore") {
    +    val df1 = Seq((1, "4"), (2, "2")).toDF("key", "value")
    +    val df2 = Seq((1, "1"), (2, "2")).toDF("key", "value")
    +    val df3 = df1.join(broadcast(df2), Seq("key"), "inner")
    +    val numBroadCastHashJoin = df3.queryExecution.executedPlan.collect {
    +      case b: BroadcastHashJoinExec => b
    +    }.size
    +    assert(numBroadCastHashJoin > 0)
    +    df3.collect()
    +    df3.destroy()
    +    val blockManager = sparkContext.env.blockManager
    +    val blocks = blockManager.getMatchingBlockIds(blockId => {
    +      blockId.isBroadcast && blockManager.getStatus(blockId).get.storageLevel.deserialized
    +    }).distinct
    +    val blockValues = blocks.flatMap { id =>
    +      blockManager.getSingle[Any](id)
    +    }
    --- End diff --
    
    I run the test 10000 times and I cannot reproduce the issue locally. Can you?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91504/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21502
  
    **[Test build #91524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91524/testReport)** for PR 21502 at commit [`789168e`](https://github.com/apache/spark/commit/789168e147615c50cfd67ba959ba1d43afb00ccf).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org