You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by rednaxelafx <gi...@git.apache.org> on 2018/04/11 10:01:48 UTC

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

GitHub user rednaxelafx opened a pull request:

    https://github.com/apache/spark/pull/21039

    [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars as transient

    ## What changes were proposed in this pull request?
    
    Mark `HashAggregateExec.bufVars` as transient to avoid it from being serialized.
    Also manually null out this field at the end of `doProduceWithoutKeys()` to shorten its lifecycle, because it'll no longer be used after that.
    
    ## How was this patch tested?
    
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rednaxelafx/apache-spark codegen-improve

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21039
    
----
commit 0b5a075b9de40d36379551720d7e0e07ad6deb40
Author: Kris Mok <kr...@...>
Date:   2018-04-11T09:35:31Z

    SPARK-23960: Mark HashAggregateExec.bufVars as transient

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    **[Test build #89185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89185/testReport)** for PR 21039 at commit [`0b5a075`](https://github.com/apache/spark/commit/0b5a075b9de40d36379551720d7e0e07ad6deb40).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21039#discussion_r180872742
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
    @@ -236,6 +236,8 @@ case class HashAggregateExec(
              | }
            """.stripMargin)
     
    +    bufVars = null  // explicitly null this field out to allow the referent to be GC'd sooner
    --- End diff --
    
    Ah, got it. Thank you for the clarification.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    LGTM, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21039#discussion_r180721843
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
    @@ -236,6 +236,8 @@ case class HashAggregateExec(
              | }
            """.stripMargin)
     
    +    bufVars = null  // explicitly null this field out to allow the referent to be GC'd sooner
    --- End diff --
    
    The workflow of whole-stage codegen ensures that `doConsumeWithoutKeys` can only be on the call stack when `doProduceWithoutKeys` is also on the call stack; the liveness of the former is strictly a subset of the latter.
    That's because for a plan tree that looks like:
    ```
    +- A
      +- B
        +- C
    ```
    The whole-stage codegen system (mostly) works like:
    ```
    A.produce
     |------> B.produce
     |         |------> C.produce
     |         |         |------> B.consume
     |         |         |         |------> A.consume
     |         |         |         |        |
     |         |         |         |<-------o
     |         |         |<--------o
     |         |<--------o
     |<--------o
     o
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    **[Test build #89185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89185/testReport)** for PR 21039 at commit [`0b5a075`](https://github.com/apache/spark/commit/0b5a075b9de40d36379551720d7e0e07ad6deb40).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    Reverted the PR due to the test failure: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/4694/testReport/junit/org.apache.spark.sql/TPCDSQuerySuite/q61/
    
    Please try to resolve it and submit it again 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/21039

I just checked the same test in Build 4695, which still has this change, and the test passed:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/4695/testReport/junit/org.apache.spark.sql/TPCDSQuerySuite/q61/
Something weird must have happened when running this test in Build 4694...

re:
> shall we just don't do the nulling out? It wouldn't help the GC a lot.
@cloud-fan I can certainly do that -- just mark the transient, removing the nulling code.
But what I'm concerned about is that the test failure was exposing some other problem that we might not have anticipated before.

The whole-stage codegen logic in most physical operators assume that codegen happens on a single thread. As such, it might use instance fields on the operator to pass state between the `doProduce` and `doConsume` methods, as is the case with `bufVars` in `HashAggregateExec`.
Now, imagine a scenario where I take a DataFrame `df`:
```scala
val plan = df.queryExecution.executedPlan

// then on Thread1
plan.execute // triggers whole-stage codegen

// at around the same time, on Thread2
plan.execute // also triggers whole-stage codegen
```
Now we're going to be performing whole-stage codegen on 2 different threads, at around the same time, on the exact same plan (if there are `HashAggregateExec`s involved, they're gonna be the same instances).
In this case, the two threads might accidentally "cross-talk" because of the way whole-stage codegen uses instance fields to pass state between `doProduce` and `doConsume`. It's basically "pure luck" that both threads finish the codegen correctly (despite one thread might be accidentally using the state from (overwritten by) another thread).

If this theory holds, nulling out the state introduces a "leak" of the "cross-talk", so it's now possible to see an NPE if the timing is just right. But it's fairly scary already even without the nulling...

Anyway, I'll resend this PR with only the `@transient` part of the change, since that won't cause any regressions for sure.

---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21039#discussion_r180719456
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala ---
    @@ -236,6 +236,8 @@ case class HashAggregateExec(
              | }
            """.stripMargin)
     
    +    bufVars = null  // explicitly null this field out to allow the referent to be GC'd sooner
    --- End diff --
    
    I am curious what happens when `bufVars ` is accessed in `doConsumeWithoutKeys`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec....

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21039


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    shall we just don't do the nulling out? It wouldn't help the GC a lot.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    Thanks for reverting it for me. The test failure was definitely related to the explicit nulling from this PR, but I can't see how that's possible yet.
    
    First of all, in the build that first introduced my change, build 4693, this particular test was passing:
    https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/4693/testReport/junit/org.apache.spark.sql/TPCDSQuerySuite/q61/
    
    The build that failed was the one immediately after that.
    
    Second, the stack trace seen from the failure indicates that `doProduceWithoutKeys` is indeed on the stack,  and it was before the line that I null'd out `bufVars`, so I can't see how `bufVars` can be null in `doConsumeWithoutKeys`.
    
    Stack trace:
    ```
    java.lang.NullPointerException
          at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithoutKeys(HashAggregateExec.scala:274)
          at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsume(HashAggregateExec.scala:171)
          at org.apache.spark.sql.execution.CodegenSupport$class.constructDoConsumeFunction(WholeStageCodegenExec.scala:209)
          at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:180)
          at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:35)
          at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:65)
          at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:182)
    ...
          at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
          at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
          at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithoutKeys(HashAggregateExec.scala:237)
          at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:163)
    ```
    
    The relevant line in `doConsumeWithoutKeys` is:
    https://github.com/apache/spark/blob/75a183071c4ed2e407c930edfdf721779662b3ee/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L274
    It's reading `bufVars`.
    
    The relevant line in `doProduceWithoutKeys` is:
    https://github.com/apache/spark/blob/75a183071c4ed2e407c930edfdf721779662b3ee/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L237
    It's calling `child.produce()`, and that's before the nulling at line 241.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89185/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21039: [SPARK-23960][SQL][MINOR] Mark HashAggregateExec.bufVars...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21039
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2201/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org