You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/02/13 12:30:08 UTC

[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20599

    [SPARK-23407][SQL] add a config to try to inline all mutable states during codegen

    ## What changes were proposed in this pull request?
    
    This is a followup of https://github.com/apache/spark/pull/19811 .
    
    In #19811, we picked a sub-optimal solution that always compact non-primitive mutable states to arrays, to make primitive mutable states more likely to get inlined.
    
    This PR introduces a new config to not treat primitive states specially and try to inline all states, to avoid any potential perf regression in Spark 2.3. By default it's false.
    
    In the future, we can remove this config, and dynamically decide which states to inline. For example, we can use placeholders during codegen, and analysis all the mutable states at the end and replace the placeholders.
    
    Note that there are no known regression cases, so this is not a blocker for Spark 2.3
    
    ## How was this patch tested?
    
    a new test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark codegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20599
    
----
commit 013c02f215de85d50b4c7125ee571b14801bdb47
Author: Wenchen Fan <we...@...>
Date:   2018-02-13T12:23:47Z

    add a config to try to inline all mutable states during codegen

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20599#discussion_r168101849
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -675,6 +675,16 @@ object SQLConf {
           "disable logging or -1 to apply no limit.")
         .createWithDefault(1000)
     
    +  val CODEGEN_TRY_INLINE_ALL_STATES =
    +    buildConf("spark.sql.codegen.tryInlineAllStates")
    +    .internal()
    +    .doc("When adding mutable states during code generation, whether or not we should try to " +
    +      "inline all the states. If this config is false, we only try to inline primitive stats, " +
    +      "so that primitive states are more likely to be inlined. Set this config to true to make " +
    +      "the behavior same as Spark 2.2.")
    --- End diff --
    
    Also watch out for a typo `s/stats/states/`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87793/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    **[Test build #87391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87391/testReport)** for PR 20599 at commit [`013c02f`](https://github.com/apache/spark/commit/013c02f215de85d50b4c7125ee571b14801bdb47).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    **[Test build #87391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87391/testReport)** for PR 20599 at commit [`013c02f`](https://github.com/apache/spark/commit/013c02f215de85d50b4c7125ee571b14801bdb47).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    **[Test build #87793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87793/testReport)** for PR 20599 at commit [`013c02f`](https://github.com/apache/spark/commit/013c02f215de85d50b4c7125ee571b14801bdb47).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20599#discussion_r167960429
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1461,20 +1465,19 @@ object CodeGenerator extends Logging {
           CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.update(classBytes.length)
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
    -        val stats = cf.methodInfos.asScala.flatMap { method =>
    +        cf.methodInfos.asScala.flatMap { method =>
    --- End diff --
    
    Are these changes related to this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1158/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/852/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20599#discussion_r168067521
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -675,6 +675,16 @@ object SQLConf {
           "disable logging or -1 to apply no limit.")
         .createWithDefault(1000)
     
    +  val CODEGEN_TRY_INLINE_ALL_STATES =
    +    buildConf("spark.sql.codegen.tryInlineAllStates")
    +    .internal()
    +    .doc("When adding mutable states during code generation, whether or not we should try to " +
    +      "inline all the states. If this config is false, we only try to inline primitive stats, " +
    +      "so that primitive states are more likely to be inlined. Set this config to true to make " +
    +      "the behavior same as Spark 2.2.")
    --- End diff --
    
    yea, let me improve it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    do we have any test about the performance regression introduced by the change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    @mgaido91 As I said in the PR description, no regression is found so far, just providing a config to be super safe.
    
    Actually this PR has a problem: the codegen usually happens at executor side, so we can't use SQLConf directy. I'll figure this out after my vacation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    **[Test build #87793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87793/testReport)** for PR 20599 at commit [`013c02f`](https://github.com/apache/spark/commit/013c02f215de85d50b4c7125ee571b14801bdb47).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    cc @kiszk @rednaxelafx @viirya @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20599#discussion_r168067551
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1461,20 +1465,19 @@ object CodeGenerator extends Logging {
           CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.update(classBytes.length)
           try {
             val cf = new ClassFile(new ByteArrayInputStream(classBytes))
    -        val stats = cf.methodInfos.asScala.flatMap { method =>
    +        cf.methodInfos.asScala.flatMap { method =>
    --- End diff --
    
    not, but a small clean up.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    This PR will be hold until 2.3 is released.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87391/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20599: [SPARK-23407][SQL] add a config to try to inline ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20599#discussion_r167852572
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -675,6 +675,16 @@ object SQLConf {
           "disable logging or -1 to apply no limit.")
         .createWithDefault(1000)
     
    +  val CODEGEN_TRY_INLINE_ALL_STATES =
    +    buildConf("spark.sql.codegen.tryInlineAllStates")
    +    .internal()
    +    .doc("When adding mutable states during code generation, whether or not we should try to " +
    +      "inline all the states. If this config is false, we only try to inline primitive stats, " +
    +      "so that primitive states are more likely to be inlined. Set this config to true to make " +
    +      "the behavior same as Spark 2.2.")
    --- End diff --
    
    I think it only behaves the same before we hit the threshold?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20599: [SPARK-23407][SQL] add a config to try to inline all mut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20599
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org