You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/01/30 08:41:58 UTC

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20434

    [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535

    ## What changes were proposed in this pull request?
    Still saw the performance regression introduced by `spark.sql.codegen.hugeMethodLimit` in our internal workloads. There are two major issues in the current solution.
    - The size of the complied byte code is not identical to the bytecode size of the method. The detection is still not accurate. 
    - The bytecode size of a single operator (e.g., `SerializeFromObject`) could still exceed 8K limit. We saw the performance regression in such scenario. 
    
    Since it is close to the release of 2.3, we decide to increase it to 64K for avoiding the perf regression.
    
    ## How was this patch tested?
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark revertConf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20434
    
----
commit 4b358dc8cbaba5603ce861623403db8b146f9337
Author: gatorsmile <ga...@...>
Date:   2018-01-30T08:30:50Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86813/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86835/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

Posted by sameeragarwal <gi...@git.apache.org>.

Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20434#discussion_r164676771
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,10 @@ object SQLConf {
       val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
         .internal()
         .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " +
    -      "codegen. When the compiled function exceeds this threshold, " +
    -      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
    -      "this is a limit in the OpenJDK JVM implementation.")
    --- End diff --
    
    nit: might want to still keep the last line around to indicate where the 64k limit is coming from


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    **[Test build #86835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86835/testReport)** for PR 20434 at commit [`c64bdfa`](https://github.com/apache/spark/commit/c64bdfa919cbb61cef636519673d780a2f2b6923).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    This is to revert back to the original behavior. Thus, we do not introduce anything else compared with 2.2 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    **[Test build #86813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86813/testReport)** for PR 20434 at commit [`4b358dc`](https://github.com/apache/spark/commit/4b358dc8cbaba5603ce861623403db8b146f9337).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20434#discussion_r164687283
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,10 @@ object SQLConf {
       val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
         .internal()
         .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " +
    -      "codegen. When the compiled function exceeds this threshold, " +
    -      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
    -      "this is a limit in the OpenJDK JVM implementation.")
    --- End diff --
    
    The 8000 byte limit is a HotSpot-specific thing, but the 64KB limit is imposed by the Java Class File format, as a part of the JVM spec.
    
    We may want to wordsmith a bit here to explain that:
    1. 65535 is a largest bytecode size possible for a valid Java method; setting the default value to 65535 is effectively turning the limit off for whole-stage codegen;
    2. For those that wish to turn this limit on when running on HotSpot, it may be preferable to set the value to `CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT` to match HotSpot's implementation.
    
    I don't have a good concrete suggestion as to how to concisely expression these two points in the doc string, though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/377/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20434#discussion_r164791479
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,10 @@ object SQLConf {
       val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
         .internal()
         .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " +
    -      "codegen. When the compiled function exceeds this threshold, " +
    -      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
    -      "this is a limit in the OpenJDK JVM implementation.")
    --- End diff --
    
    Did the update


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    @gatorsmile .
    In the original PR, https://github.com/apache/spark/pull/18810, there was a microbenchmark.
    Can we have the result on the same benchmark here, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Yes. We need to avoid the performance regression since the last release Spark 2.2


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/394/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    **[Test build #86835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86835/testReport)** for PR 20434 at commit [`c64bdfa`](https://github.com/apache/spark/commit/c64bdfa919cbb61cef636519673d780a2f2b6923).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    cc @sameeragarwal @zsxwing @rxin @cloud-fan @rednaxelafx @yhuai 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20434#discussion_r164845127
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -660,12 +660,13 @@ object SQLConf {
       val WHOLESTAGE_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
         .internal()
         .doc("The maximum bytecode size of a single compiled Java function generated by whole-stage " +
    -      "codegen. When the compiled function exceeds this threshold, " +
    -      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
    -      "this is a limit in the OpenJDK JVM implementation.")
    +      "codegen. When the compiled function exceeds this threshold, the whole-stage codegen is " +
    +      "deactivated for this subtree of the current query plan. The default value is 65535, which " +
    +      "is the largest bytecode size possible for a valid Java method. When running on HotSpot, " +
    +      s"it may be preferable to set the value to ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} " +
    +      "to match HotSpot's implementation.")
         .intConf
    -    .createWithDefault(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT)
    +    .createWithDefault(65535)
    --- End diff --
    
    cc @mgaido91 .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Does this value lead to no performance degradation of other typical workloads (e.g. TPC-DS)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    @kiszk TPC-DS just shows the typical data analytics workloads. However, Spark SQL is being used for ETL like workloads. The regression happened in a complex pipeline of structured streaming workloads. Will do more investigation after 2.3 release. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20434


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    I see. The baseline is 2.2, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    **[Test build #86813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86813/testReport)** for PR 20434 at commit [`4b358dc`](https://github.com/apache/spark/commit/4b358dc8cbaba5603ce861623403db8b146f9337).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20434
  
    It would be good to put a problematic code for future fix. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org