You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yucai <gi...@git.apache.org> on 2018/10/26 08:11:11 UTC

[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

GitHub user yucai opened a pull request:

    https://github.com/apache/spark/pull/22847

    [SPARK-25850][SQL] Make the split threshold for the code generated method configurable

    ## What changes were proposed in this pull request?
    As per the [discussion](https://github.com/apache/spark/pull/22823/files#r228400706), add a new configuration to make the split threshold for the code generated method configurable.
    
    When the generated Java function source code exceeds the split threshold, it will be split into multiple small functions, each function length is spark.sql.codegen.methodSplitThreshold.
    
    ## How was this patch tested?
    manual tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yucai/spark splitThreshold

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22847
    
----
commit 188a9476e3504c151ebe27b362a080469c262674
Author: yucai <yy...@...>
Date:   2018-10-26T08:00:24Z

    [SPARK-25850][SQL] Make the split threshold for the code generated method configurable

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r228483780
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The maximum source code length of a single Java function by codegen. When the " +
    +      "generated Java function source code exceeds this threshold, it will be split into " +
    +      "multiple small functions, each function length is spark.sql.codegen.methodSplitThreshold." +
    --- End diff --
    
    `each function length is spark.sql.codegen.methodSplitThreshold` this is not true, the method size is always larger than the threshold. cc @kiszk any idea about the naming and description of this config?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Just in case people wonder, the following is the hack patch that I used for stress testing code splitting before this PR:
    ```diff
    --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
    +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
    @@ -647,11 +647,13 @@ class CodegenContext(val useStreamlining: Boolean) {
        * Returns a term name that is unique within this instance of a `CodegenContext`.
        */
       def freshName(name: String): String = synchronized {
    -    val fullName = if (freshNamePrefix == "") {
    +    // hack: intentionally add a very long prefix (length=300 characters) to
    +    // trigger code splitting more frequently
    +    val fullName = ("averylongprefix" * 20) + (if (freshNamePrefix == "") {
           name
         } else {
           s"${freshNamePrefix}_$name"
    -    }
    +    })
         if (freshNameIds.contains(fullName)) {
           val id = freshNameIds(fullName)
           freshNameIds(fullName) = id + 1
    ```
    Of course, now with this PR, we can simply set the split threshold to a very low value (e.g. `1`) to force split.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229572426
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,18 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We cannot know how many bytecode will " +
    --- End diff --
    
    `The threshold of source-code splitting in the codegen. When the number of characters in a single JAVA function (without comment) exceeds the threshold, the function will be automatically split to multiple smaller ones.`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    @cloud-fan @gatorsmile How about merging this PR first? And then we can dissuss those performance issue in other PR?
    1. One PR to improve WideTableBenchmark #22823 WIP.
    2. One PR to add more tests in WideTableBenchmark.
    3. If we can figure out good split threshold based on 2, another PR to update that value.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98384/testReport)** for PR 22847 at commit [`65a5355`](https://github.com/apache/spark/commit/65a5355a352ad228786b930e41628e0f255e9b59).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98129/testReport)** for PR 22847 at commit [`ee986cc`](https://github.com/apache/spark/commit/ee986ccb4ec48337434f68a3e16953e6ebc2f67f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r228755710
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    According to the description, it seems that we had better have `checkValue` here. Could you recommend the reasonable min/max values, @kiszk ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    @cloud-fan @rednaxelafx I missed that! Please help review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98301/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229577559
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,18 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source-code splitting in the codegen. When the number of characters " +
    +      "in a single JAVA function (without comment) exceeds the threshold, the function will be " +
    --- End diff --
    
    nit: `JAVA` -> `Java`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    @rednaxelafx ah good point! It's hardcoded as 1024 too, and it's also doing method splitting. Let's apply the config there too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98295/testReport)** for PR 22847 at commit [`2fc6417`](https://github.com/apache/spark/commit/2fc6417e8e4b1e95be2d5f614811d4898cd696f3).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98298/testReport)** for PR 22847 at commit [`2fc6417`](https://github.com/apache/spark/commit/2fc6417e8e4b1e95be2d5f614811d4898cd696f3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98301/testReport)** for PR 22847 at commit [`8eeb538`](https://github.com/apache/spark/commit/8eeb5385943bab33fd9fa9fe132315b137c831ed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98384/testReport)** for PR 22847 at commit [`65a5355`](https://github.com/apache/spark/commit/65a5355a352ad228786b930e41628e0f255e9b59).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Using source code length will have to be a very coarse-grained, "fuzzy" heuristic. It's not meant to be accurate. So just pick some number that makes sense.
    2048 might be a good enough choice for now.
    
    Tidbit: before this PR, since the `1024` split threshold is hard coded, the way I stress test the code splitting path is to hack into `CodegenContext.freshName()` and artificially prepend a very long prefix to the symbol names.
    e.g. "project_value_1" becomes "averylongprefixrepeatedmultipletimes_project_value_1". That inflates the source code size a lot which triggers code splitting a lot more frequently under the existing heuristics.
    I'm mentioning this here to show that source code length based heuristics are inherently imprecise. It'll actually behave slightly differently for expressions in different physical operators because of the prefix length is different — "serializefromobject_" is much longer than "project_", which is longer than "agg_", right?
    
    Side notes for those curious:
    (details are way too complicated to be addressed by a source code length based heuristic, but I'm making this note just to let people know what's involved)
    
    Performance-wise, it might be very interesting to note that, running on HotSpot, sometimes it might be better to split into more smaller methods, and sometimes it might be better to split into less but larger methods.
    
    The theory is this: if a method is really hot, we'd like it to be eventually inlined by the JIT to reduce method invocation overhead and get better optimization. In HotSpot, hot methods will eventually be compiled by C2, and it uses multiple bytecode size based threshold to determine whether or not a callee is fit for inlining.
    ```
    MaxInlineSize = 35 // in bytes of bytecode. "The maximum bytecode size of a method to be inlined"
    FreqInlineSize = 325 // in bytes of bytecode. "The maximum bytecode size of a frequent method to be inlined"
    MaxTrivialSize = 6 // in bytes of bytecode. "The maximum bytecode size of a trivial method to be inlined"
    InlineFrequencyRatio = 20 // "Ratio of call site execution to caller method invocation"
    // there's also a InlineFrequencyCount which we'll omit here for simplicity
    ```
    (values shown above are default values for C2 on x86.)
    c.f. http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/share/vm/runtime/globals.hpp
    
    Don't be confused by the name, though.
    `MaxTrivialSize` is meant for regular getters/setters, and they're considered to be always inlineable (unless explicitly configured as `dontinline` in a compiler directive option).
    `MaxInlineSize` is for regular small methods, with more checks than the "trivial" case, without checking the invocation frequency.
    `InlineFrequencyRatio` is the callee-to-caller invocation ratio for a call site to be considered as "hot" (or "frequent"). The default is 20:1. This is usually for call sites inside of loops, and HotSpot can collect loop counts / invocation counts in profiling, to figure out the ratio.
    Similarly, there's a `InlineFrequencyCount` that if a call site (after scaling according to inline depth etc) is invoked more than this many times, it can also be considered as "hot".
    `FreqInlineSize` is the max size a "hot" call site can be considered for inlining. It's 325 bytes of bytecode, obviously larger than `MaxInlineSize` so it might sound confusing at first ;-)
    
    So the deal is: on HotSpot, if you want to save bytecode code size by
    - Outlining a utility method that's <= 6 bytes, there's no overhead at all. But in practice this is not really feasible, because at the call site of this utility method, "invokespecial" itself is 3 bytes, and you'd at least need to pass in "this" which is 1 byte, and if you need to pass in arguments (e.g. INPUT_ROW) that's at least 1 or 2 more bytes, which already adds up to 4 to 5 bytes at the call site, whereas the callee's method body can only be <= 6 bytes, there's not much you can do…
    - Outlining a utility method that's <= 35 bytes. That's the easiest for both HotSpot C1 and C2 to inline. You can do somethings but not a lot in this method. If you want to shoot for this case, then don't make the split threshold too big.
    - Outlining a utility method that's <= 325 bytes. In HotSpot, only C2 gets to inline big methods like this, and only "hot call sites" are considered.
    
    So, if a method is compute-intensive and is small enough, it'd better to strive to keep the split-off methods within the thresholds mentioned above.
    Otherwise, if it's already bigger than the inline thresholds, then it makes more sense to make the split-off method as big as possible (but below 8000 bytes of bytecode) so that the invocation overhead is amortized.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r228789484
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    --- End diff --
    
    Not a big deal but I would avoid abbreviation in documentation. `can't` -> `cannot`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98080/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98298/testReport)** for PR 22847 at commit [`2fc6417`](https://github.com/apache/spark/commit/2fc6417e8e4b1e95be2d5f614811d4898cd696f3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229879855
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    Could you try some very long alias names or complex expressions? You will get different number, right? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98080/testReport)** for PR 22847 at commit [`188a947`](https://github.com/apache/spark/commit/188a9476e3504c151ebe27b362a080469c262674).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98130/testReport)** for PR 22847 at commit [`b578dd4`](https://github.com/apache/spark/commit/b578dd45cb4e6831a4bb54ba4c0d9c8f5c84fec5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229638917
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    @kiszk agree, `1000` might be not the best, see my benchmark for the wide table, `2048` is better.
    ```
    ================================================================================================
    projection on wide table
    ================================================================================================
     Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    projection on wide table:                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    split threshold 10                            8464 / 8737          0.1        8072.0       1.0X
    split threshold 100                           5959 / 6251          0.2        5683.4       1.4X
    split threshold 1024                          3202 / 3248          0.3        3053.2       2.6X
    split threshold 2048                          3009 / 3097          0.3        2869.2       2.8X
    split threshold 4096                          3414 / 3458          0.3        3256.1       2.5X
    split threshold 8196                          4095 / 4112          0.3        3905.5       2.1X
    split threshold 65536                       28800 / 29705          0.0       27465.8       0.3X
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98132/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98386/testReport)** for PR 22847 at commit [`65a5355`](https://github.com/apache/spark/commit/65a5355a352ad228786b930e41628e0f255e9b59).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    @cloud-fan @dongjoon-hyun @kiszk I just add a negative check, maybe we need another PR to figure better value later if it is not easy to decide now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98298/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229943260
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    Oh I see, you're using the column name...that's not the right place to put the "prefix". Column names are almost never carried over to the generated code in the current framework (the only exception is the lambda variable name).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229577345
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    1000 is conservative. But, there is no recommendation since the bytecode size depends on the content (e.g. `0`'s byte code length is 1. `9`'s byte code lengh is 2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98386/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r228788123
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    To be more accurately, I think I should add `When running on HotSpot, a function's bytecode should not go beyond 8KB...`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98295/testReport)** for PR 22847 at commit [`2fc6417`](https://github.com/apache/spark/commit/2fc6417e8e4b1e95be2d5f614811d4898cd696f3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98182/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98384/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    @cloud-fan @dongjoon-hyun @gengliangwang Kindly help review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98130/testReport)** for PR 22847 at commit [`b578dd4`](https://github.com/apache/spark/commit/b578dd45cb4e6831a4bb54ba4c0d9c8f5c84fec5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    I used the WideTableBenchmark to test this configuration.
    4 scenarioes are tested, `2048` is always better than `1024`, overall it is also good and looks more safe to avoid hitting 8KB limitaion. 
    
    **Scenario**
    1. projection on wide table: simple
    ```
    val N = 1 << 20
    val df = spark.range(N)
    val columns = (0 until 400).map{ i => s"id as id$i"}
    df.selectExpr(columns: _*).foreach(identity(_))
    ```
    2. projection on wide table: long alias names
    ```
    val longName = "averylongaliasname" * 20
    val columns = (0 until 400).map{ i => s"id as ${longName}_id$i"}
    df.selectExpr(columns: _*).foreach(identity(_))
    ```
    3. projection on wide table: many complex expressions
    ```
    // 400 columns, whole stage codegen is disabled for spark.sql.codegen.maxFields
    val columns = (0 until 400).map{ i => s"case when id = $i then $i else 800 end as id$i"}
    df.selectExpr(columns: _*).foreach(identity(_))
    ```
    4. projection on wide table: a big complex expressions
    ```
    // Because of spark.sql.subexpressionElimination.enabled,
    // the whole case when codes will be put into one function,
    // and it will be invoked once only.
    val columns = (0 until 400).map{ i =>
        s"case when id = ${N + 1} then 1
               when id = ${N + 2} then 1
               ...
               when id = ${N + 6} then 1
               else sqrt(N) end as id$i"}
    df.selectExpr(columns: _*).foreach(identity(_))
    ```
    
    **Perf Results**
    ```
    ================================================================================================
    projection on wide table: simple
    ================================================================================================
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    projection on wide table: simple:        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    split threshold 10                            7553 / 7676          0.1        7202.7       1.0X
    split threshold 100                           5463 / 5504          0.2        5210.0       1.4X
    split threshold 1024                          2981 / 3017          0.4        2843.0       2.5X
    split threshold 2048                          2857 / 2897          0.4        2724.2       2.6X
    split threshold 4096                          3128 / 3187          0.3        2983.3       2.4X
    split threshold 8196                          3755 / 3793          0.3        3581.3       2.0X
    split threshold 65536                       27616 / 27685          0.0       26336.2       0.3X
    
    
    ================================================================================================
    projection on wide table: long alias names
    ================================================================================================
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    projection on wide table: long alias names: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    split threshold 10                            7513 / 7566          0.1        7164.6       1.0X
    split threshold 100                           5363 / 5410          0.2        5114.4       1.4X
    split threshold 1024                          2966 / 2998          0.4        2828.3       2.5X
    split threshold 2048                          2840 / 2864          0.4        2708.0       2.6X
    split threshold 4096                          3126 / 3166          0.3        2981.2       2.4X
    split threshold 8196                          3756 / 3823          0.3        3582.3       2.0X
    split threshold 65536                       27542 / 27729          0.0       26266.4       0.3X
    
    
    ================================================================================================
    projection on wide table: many complex expressions
    ================================================================================================
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    projection on wide table: complex expressions 1: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    split threshold 10                            8758 / 9007          0.1        8352.3       1.0X
    split threshold 100                           8675 / 8754          0.1        8272.9       1.0X
    split threshold 1024                          3696 / 3797          0.3        3524.7       2.4X
    split threshold 2048                          3349 / 3419          0.3        3193.9       2.6X
    split threshold 4096                          2967 / 3019          0.4        2829.1       3.0X
    split threshold 8196                          3757 / 3841          0.3        3582.5       2.3X
    split threshold 65536                       39805 / 40309          0.0       37960.7       0.2X
    
    
    ================================================================================================
    projection on wide table: a big complex expressions
    ================================================================================================
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    projection on wide table: complex expressions 2: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    split threshold 10                            6349 / 6523          0.2        6054.7       1.0X
    split threshold 100                           4804 / 4912          0.2        4581.4       1.3X
    split threshold 1024                          3145 / 3193          0.3        2999.4       2.0X
    split threshold 2048                          3089 / 3124          0.3        2945.9       2.1X
    split threshold 4096                          2987 / 3060          0.4        2848.3       2.1X
    split threshold 8196                          3705 / 3718          0.3        3533.4       1.7X
    split threshold 65536                       17102 / 17156          0.1       16310.1       0.4X
    ```
    Complete benchmark source: [WideTableBenchmark.scala](https://github.com/yucai/spark/blob/betterSplitThreshold/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideTableBenchmark.scala)
    
    And I have another finding.
    In above testing, the wide table's columns are more than 100, so whole stage code gen is disabled, because of `spark.sql.codegen.maxFields`.
    If the wide table's columns are less than 100, whole stage code gen is enabled, but the expressions in `ProjectExec`'s could not be split. Like below:
    ```
    /* 133 */   private void project_doConsume_0(long project_expr_0_0) throws java.io.IOException {
    /* 134 */     // CONSUME: WholeStageCodegen
    /* 135 */     // CASE WHEN (input[0, bigint, false] = 0) THEN 0 ELSE 800 END
    /* 136 */     byte project_caseWhenResultState_0 = -1;
    /* 137 */     do {
    /* 138 */       // (input[0, bigint, false] = 0)
    /* 139 */       boolean project_value_2 = false;
    /* 140 */       project_value_2 = project_expr_0_0 == 0L;
    /* 141 */       if (!false && project_value_2) {
    /* 142 */         project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
    /* 143 */         project_project_value_1_0 = 0;
    /* 144 */         continue;
    ...
    ```
    `project_doConsume_0` is almost 2000 lines, whose byte code are probably more than 8KB.
    
    Not sure if it is a known issue?
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229919857
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    Seems like long alias names have no influence.
    ```
    [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
    [info] Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    [info] projection on wide table:                Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    [info] ------------------------------------------------------------------------------------------------
    [info] split threshold 10                            6512 / 6736          0.2        6210.4       1.0X
    [info] split threshold 100                           5730 / 6329          0.2        5464.9       1.1X
    [info] split threshold 1024                          3119 / 3184          0.3        2974.6       2.1X
    [info] split threshold 2048                          2981 / 3100          0.4        2842.9       2.2X
    [info] split threshold 4096                          3289 / 3379          0.3        3136.6       2.0X
    [info] split threshold 8196                          4307 / 4338          0.2        4108.0       1.5X
    [info] split threshold 65536                       29147 / 30212          0.0       27797.0       0.2X
    ```
    
    No `averylongprefixrepeatedmultipletimes` in the **expression code gen**:
    ```
    /* 047 */   private void createExternalRow_0_8(InternalRow i, Object[] values_0) {
    /* 048 */
    /* 049 */     // input[80, bigint, false]
    /* 050 */     long value_81 = i.getLong(80);
    /* 051 */     if (false) {
    /* 052 */       values_0[80] = null;
    /* 053 */     } else {
    /* 054 */       values_0[80] = value_81;
    /* 055 */     }
    /* 056 */
    /* 057 */     // input[81, bigint, false]
    /* 058 */     long value_82 = i.getLong(81);
    /* 059 */     if (false) {
    /* 060 */       values_0[81] = null;
    /* 061 */     } else {
    /* 062 */       values_0[81] = value_82;
    /* 063 */     }
    /* 064 */
    /* 065 */     // input[82, bigint, false]
    /* 066 */     long value_83 = i.getLong(82);
    /* 067 */     if (false) {
    /* 068 */       values_0[82] = null;
    /* 069 */     } else {
    /* 070 */       values_0[82] = value_83;
    /* 071 */     }
    /* 072 */
    ...
    ```
    
    My benchmark:
    ```
    object WideTableBenchmark extends SqlBasedBenchmark {
    
      override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
        runBenchmark("projection on wide table") {
          val N = 1 << 20
          val df = spark.range(N)
          val columns = (0 until 400).map{ i => s"id as averylongprefixrepeatedmultipletimes_id$i"}
          val benchmark = new Benchmark("projection on wide table", N, output = output)
          Seq("10", "100", "1024", "2048", "4096", "8196", "65536").foreach { n =>
            benchmark.addCase(s"split threshold $n", numIters = 5) { iter =>
              withSQLConf("spark.testing.codegen.splitThreshold" -> n) {
                df.selectExpr(columns: _*).foreach(identity(_))
              }
            }
          }
          benchmark.run()
        }
      }
    }
    ```
    
    Will keep benchmarking for the complex expression.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98288/testReport)** for PR 22847 at commit [`ba392c8`](https://github.com/apache/spark/commit/ba392c89f6c96b3fa64d22a564ac1742b7d0ff54).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98130/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r230248773
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    @rednaxelafx the wide table benchmark I used has 400 columns, whole stage codegen is disabled by default.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98080/testReport)** for PR 22847 at commit [`188a947`](https://github.com/apache/spark/commit/188a9476e3504c151ebe27b362a080469c262674).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98190 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98190/testReport)** for PR 22847 at commit [`b0ce2ca`](https://github.com/apache/spark/commit/b0ce2cae731620a0ce7417b2b46c24cffb023059).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r228600757
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The maximum source code length of a single Java function by codegen. When the " +
    +      "generated Java function source code exceeds this threshold, it will be split into " +
    +      "multiple small functions, each function length is spark.sql.codegen.methodSplitThreshold." +
    +      " A function's bytecode should not go beyond 8KB, otherwise it will not be JITted, should " +
    +      "also not be too small, or we will have many function calls. We can't know how many " +
    --- End diff --
    
    it will not be JITted; it also should not not be too small, otherwise there will be many function calls. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98182/testReport)** for PR 22847 at commit [`b0ce2ca`](https://github.com/apache/spark/commit/b0ce2cae731620a0ce7417b2b46c24cffb023059).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98386/testReport)** for PR 22847 at commit [`65a5355`](https://github.com/apache/spark/commit/65a5355a352ad228786b930e41628e0f255e9b59).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229478558
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    Yep. That could be `max`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22847


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229538148
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We cannot know how many bytecode will " +
    +      "be generated, so use the code length as metric. When running on HotSpot, a function's " +
    +      "bytecode should not go beyond 8KB, otherwise it will not be JITted; it also should not " +
    +      "be too small, otherwise there will be many function calls.")
    +    .intConf
    +    .createWithDefault(1024)
    --- End diff --
    
    let's add a check value to make sure the value is positive. We can figure out a lower and upper bound later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98129/testReport)** for PR 22847 at commit [`ee986cc`](https://github.com/apache/spark/commit/ee986ccb4ec48337434f68a3e16953e6ebc2f67f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98301/testReport)** for PR 22847 at commit [`8eeb538`](https://github.com/apache/spark/commit/8eeb5385943bab33fd9fa9fe132315b137c831ed).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98288/testReport)** for PR 22847 at commit [`ba392c8`](https://github.com/apache/spark/commit/ba392c89f6c96b3fa64d22a564ac1742b7d0ff54).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98190/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98132/testReport)** for PR 22847 at commit [`0db224f`](https://github.com/apache/spark/commit/0db224f0eebc52a8fc1dc47fa03ff78151b3b6d9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r229942325
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The threshold of source code length without comment of a single Java function by " +
    +      "codegen to be split. When the generated Java function source code exceeds this threshold" +
    +      ", it will be split into multiple small functions. We can't know how many bytecode will " +
    +      "be generated, so use the code length as metric. A function's bytecode should not go " +
    +      "beyond 8KB, otherwise it will not be JITted; it also should not be too small, otherwise " +
    +      "there will be many function calls.")
    +    .intConf
    --- End diff --
    
    The "freshNamePrefix" prefix is only applied in whole-stage codegen, 
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L87
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L169
    
    It doesn't take any effect in non-whole-stage codegen.
    
    If you intend to stress test expression codegen but don't see the prefix being prepended, you're probably not adding it in the right place. Where did you add it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98182/testReport)** for PR 22847 at commit [`b0ce2ca`](https://github.com/apache/spark/commit/b0ce2cae731620a0ce7417b2b46c24cffb023059).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98288/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98129/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98190/testReport)** for PR 22847 at commit [`b0ce2ca`](https://github.com/apache/spark/commit/b0ce2cae731620a0ce7417b2b46c24cffb023059).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by rednaxelafx <gi...@git.apache.org>.
Github user rednaxelafx commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Can / should we apply the same threshold conf to `Expression.reduceCodeSize()`?
    cc @yucai @cloud-fan @kiszk 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    did you address https://github.com/apache/spark/pull/22847#issuecomment-434836278 ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98295/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22847#discussion_r228598058
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -812,6 +812,17 @@ object SQLConf {
         .intConf
         .createWithDefault(65535)
     
    +  val CODEGEN_METHOD_SPLIT_THRESHOLD = buildConf("spark.sql.codegen.methodSplitThreshold")
    +    .internal()
    +    .doc("The maximum source code length of a single Java function by codegen. When the " +
    +      "generated Java function source code exceeds this threshold, it will be split into " +
    +      "multiple small functions, each function length is spark.sql.codegen.methodSplitThreshold." +
    --- End diff --
    
    IMHO, `spark.sql.codegen.methodSplitThreshold` can be used, but the description should be changed. For example,
    `The threshold of source code length without comment of a single Java function by codegen to be split. When the generated Java function source code exceeds this threshold, it will be split into multiple small functions. ...`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22847
  
    **[Test build #98132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98132/testReport)** for PR 22847 at commit [`0db224f`](https://github.com/apache/spark/commit/0db224f0eebc52a8fc1dc47fa03ff78151b3b6d9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org