You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2017/08/30 02:05:05 UTC

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/19083

    [SPARK-21871][SQL] Check actual bytecode size when compiling generated code

    ## What changes were proposed in this pull request?
    This pr added code to check actual bytecode size when compiling generated code. In #18810, we added code to give up code compilation and use interpreter execution in `SparkPlan` if the line number of generated functions goes over `maxLinesPerFunction`. But, we already have code to collect metrics for compiled bytecode size in `CodeGenerator` object. So,we could easily reuse the code for this purpose.
    
    ## How was this patch tested?
    Added tests in `WholeStageCodegenSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-21871

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19083
    
----
commit 65eb028cb501a7e3675710ec72ac773a1700b5f4
Author: Takeshi Yamamuro <ya...@apache.org>
Date:   2017-08-23T13:25:02Z

    Check compiled byte code size

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Y


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82445/testReport)** for PR 19083 at commit [`09ae105`](https://github.com/apache/spark/commit/09ae105c101a1b31d2a8873976c01590c50411d2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136266187
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    yea, ok! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058550
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1020,6 +1006,10 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of HugeMethodLimit in the OpenJDK JVM settings
    +  val DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT = 8000
    --- End diff --
    
    ok, I'll update


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136240836
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    I see. I feel detecting specific JVM implementations might go too far for this purpose. yea, one of options is to add an internal option for this. WDYT? @rednaxelafx 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r137378351
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
    @@ -98,19 +99,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
       }
     
       test("SPARK-18091: split large if expressions into blocks due to JVM code size limit") {
    -    var strExpr: Expression = Literal("abc")
    -    for (_ <- 1 to 150) {
    -      strExpr = Decode(Encode(strExpr, "utf-8"), "utf-8")
    -    }
    +    // Set the max value at `WHOLESTAGE_HUGE_METHOD_LIMIT` to compile gen'd code by janino
    +    withSQLConf(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key -> Int.MaxValue.toString) {
    --- End diff --
    
    Why do we need to change value for `WHOLESTAGE_HUGE_METHOD_LIMIT` while this test is not for whole-stage codegen?
    We could select better naming for `SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r135989181
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
    @@ -176,34 +176,46 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext {
         })
     
         assert(wholeStageCodeGenExec.isDefined)
    -    wholeStageCodeGenExec.get.asInstanceOf[WholeStageCodegenExec].doCodeGen()._1
    +    wholeStageCodeGenExec.get.asInstanceOf[WholeStageCodegenExec].doCodeGen()
       }
     
       test("SPARK-21603 check there is a too long generated function") {
         withSQLConf(SQLConf.WHOLESTAGE_MAX_LINES_PER_FUNCTION.key -> "1500") {
    -      val ctx = genGroupByCodeGenContext(30)
    +      val (ctx, _) = genGroupByCodeGenContext(30)
           assert(ctx.isTooLongGeneratedFunction === true)
         }
       }
     
       test("SPARK-21603 check there is not a too long generated function") {
         withSQLConf(SQLConf.WHOLESTAGE_MAX_LINES_PER_FUNCTION.key -> "1500") {
    -      val ctx = genGroupByCodeGenContext(1)
    +      val (ctx, _) = genGroupByCodeGenContext(1)
           assert(ctx.isTooLongGeneratedFunction === false)
         }
       }
     
       test("SPARK-21603 check there is not a too long generated function when threshold is Int.Max") {
         withSQLConf(SQLConf.WHOLESTAGE_MAX_LINES_PER_FUNCTION.key -> Int.MaxValue.toString) {
    -      val ctx = genGroupByCodeGenContext(30)
    +      val (ctx, _) = genGroupByCodeGenContext(30)
           assert(ctx.isTooLongGeneratedFunction === false)
         }
       }
     
       test("SPARK-21603 check there is a too long generated function when threshold is 0") {
         withSQLConf(SQLConf.WHOLESTAGE_MAX_LINES_PER_FUNCTION.key -> "0") {
    -      val ctx = genGroupByCodeGenContext(1)
    +      val (ctx, _) = genGroupByCodeGenContext(1)
           assert(ctx.isTooLongGeneratedFunction === true)
         }
       }
    +
    +  test("SPARK-21871 turn off whole-stage codegen if bytecode size goes over HugeMethodLimit") {
    +    withSQLConf(SQLConf.WHOLESTAGE_MAX_LINES_PER_FUNCTION.key -> Int.MaxValue.toString) {
    +      val (_, code) = genGroupByCodeGenContext(20)
    +      val errMsg = intercept[IllegalArgumentException] {
    +        CodeGenerator.compile(code)
    +      }.getMessage
    +      assert(errMsg.contains("the size of GeneratedClass.agg_doAggregateWithKeys is 9182 and " +
    --- End diff --
    
    The hard-coded method name and size is a bit too strict. Can we relax that a little bit so that it's more resilient to the naming and the size of the generated classes/methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142576983
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -333,33 +334,28 @@ class AggregateBenchmark extends BenchmarkBase {
           .sum()
           .collect()
     
    -    benchmark.addCase(s"codegen = F") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    +    benchmark.addCase(s"hugeMethodLimit = 8000") { iter =>
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "8000")
           f()
         }
     
    -    benchmark.addCase(s"codegen = T maxLinesPerFunction = 10000") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.maxLinesPerFunction", "10000")
    -      f()
    -    }
    -
    -    benchmark.addCase(s"codegen = T maxLinesPerFunction = 1500") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.maxLinesPerFunction", "1500")
    +    benchmark.addCase(s"hugeMethodLimit = 16000") { iter =>
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "16000")
           f()
         }
     
         benchmark.run()
     
         /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14 on Windows 7 6.1
    -    Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
    -    max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)  Per Row(ns)  Relative
    -    ----------------------------------------------------------------------------------------------
    -    codegen = F                                    462 /  533          1.4       704.4     1.0X
    -    codegen = T maxLinesPerFunction = 10000       3444 / 3447          0.2      5255.3     0.1X
    -    codegen = T maxLinesPerFunction = 1500         447 /  478          1.5       682.1     1.0X
    +    Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 on Mac OS X 10.10.2
    +    Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    +
    +    max function bytecode size:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    hugeMethodLimit = 8000                        1043 / 1159          0.6        1591.5       1.0X
    --- End diff --
    
    The original `codegen = F` case is removed? I think it is reasonable to compare with it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82362/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058489
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1020,6 +1006,10 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of HugeMethodLimit in the OpenJDK JVM settings
    +  val DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT = 8000
    --- End diff --
    
    I think just `DEFAULT_JVM_HUGE_METHOD_LIMIT` is ok. The comment already explains it is from OpenJDK.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82442/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142527524
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1020,10 +1006,14 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of HugeMethodLimit in the OpenJDK JVM settings
    +  val DEFAULT_JVM_HUGE_METHOD_LIMIT = 8000
    +
       /**
        * Compile the Java source code into a Java class, using Janino.
        */
    -  def compile(code: CodeAndComment): GeneratedClass = try {
    +  def compile(code: CodeAndComment): (GeneratedClass, Int) = try {
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Do we still need `spark.sql.codegen.maxLinesPerFunction`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81314/testReport)** for PR 19083 at commit [`9c58237`](https://github.com/apache/spark/commit/9c58237fbbce32942674eb11cae6588e90df5a92).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82353/testReport)** for PR 19083 at commit [`87140fb`](https://github.com/apache/spark/commit/87140fb76b8d24e2d7c100b43915db253905d191).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    @gatorsmile  if you get time, could you check this? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82367/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142012633
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -585,10 +586,22 @@ object SQLConf {
         .doc("The maximum lines of a single Java function generated by whole-stage codegen. " +
           "When the generated function exceeds this threshold, " +
           "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      "The default value 4000 is the max length of byte code JIT supported " +
    -      "for a single function(8000) divided by 2.")
    +      s"The default value ${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT / 2} is " +
    +      "the max length of byte code JIT supported for a single function" +
    +      s"(${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT}}) divided by 2.")
         .intConf
    -    .createWithDefault(4000)
    +    .createOptional
    +
    +  val CODEGEN_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
    +    .internal()
    +    .doc("The bytecode size of a single compiled Java function generated by whole-stage codegen." +
    --- End diff --
    
    `The bytecode` -> `The maximum bytecode`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    IMHO we could drop the option safely. cc: @eatoncys 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r135955285
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by.sql ---
    @@ -30,8 +30,15 @@ SELECT a + 2, COUNT(b) FROM testData GROUP BY a + 1;
     SELECT a + 1 + 1, COUNT(b) FROM testData GROUP BY a + 1;
     
     -- Aggregate with nulls.
    +--
    +-- In SPARK-21871, we added code to check the bytecode size of gen'd methods. If the size
    --- End diff --
    
    If the issue in #19082 fixed, we might remove workaround. Or, we might use a new flag discussed in #19062 here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82435/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142037322
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -301,10 +301,10 @@ class AggregateBenchmark extends BenchmarkBase {
         */
       }
     
    -  ignore("max function length of wholestagecodegen") {
    +  test("max function length of codegen") {
    --- End diff --
    
    This should be "ignore"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82445/testReport)** for PR 19083 at commit [`09ae105`](https://github.com/apache/spark/commit/09ae105c101a1b31d2a8873976c01590c50411d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82443/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136237952
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    While IBM JDK has [the similar option](http://www-01.ibm.com/support/docview.wss?uid=swg1IV78208) `-Xjit:acceptHugeMethods`, IBM JDK unfortunately does not disclose its threshold now.
    
    My suggestion is to make this threshold configurable by using `SQLConf`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Yeah, please remove it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81312/testReport)** for PR 19083 at commit [`78af6f4`](https://github.com/apache/spark/commit/78af6f4fcc2d4f7afebb2a3f2de0c16c562fcdb4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CodeGenerationSuite extends PlanTest with ExpressionEvalHelper `
      * `class OrderingSuite extends PlanTest with ExpressionEvalHelper `
      * `class GeneratedProjectionSuite extends PlanTest `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Few minor comments otherwise LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82446/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142020451
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
    @@ -416,25 +416,31 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext {
       }
     
       test("zero moments") {
    -    val input = Seq((1, 2)).toDF("a", "b")
    -    checkAnswer(
    -      input.agg(stddev('a), stddev_samp('a), stddev_pop('a), variance('a),
    -        var_samp('a), var_pop('a), skewness('a), kurtosis('a)),
    -      Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
    -        Double.NaN, Double.NaN))
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly turn off whole-stage codegen here.
    +    // This guard can be removed if this issue fixed.
    +    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    The same here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058941
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
    @@ -432,25 +432,31 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext {
       }
     
       test("zero moments") {
    -    val input = Seq((1, 2)).toDF("a", "b")
    -    checkAnswer(
    -      input.agg(stddev('a), stddev_samp('a), stddev_pop('a), variance('a),
    -        var_samp('a), var_pop('a), skewness('a), kurtosis('a)),
    -      Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
    -        Double.NaN, Double.NaN))
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly made this threshold higher here.
    +    // This workaround can be removed if this issue fixed.
    +    withSQLConf(SQLConf.CODEGEN_HUGE_METHOD_LIMIT.key -> "16000") {
    --- End diff --
    
    Once we can fall back to non-codegen execution later, we can revert those changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058540
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    --- End diff --
    
    aha, I see. I also think this info is less meaning for users, but some meaningful for developers? So, how about changing to `logDebug`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142503203
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1020,10 +1006,14 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of HugeMethodLimit in the OpenJDK JVM settings
    +  val DEFAULT_JVM_HUGE_METHOD_LIMIT = 8000
    +
       /**
        * Compile the Java source code into a Java class, using Janino.
        */
    -  def compile(code: CodeAndComment): GeneratedClass = try {
    +  def compile(code: CodeAndComment): (GeneratedClass, Int) = try {
    --- End diff --
    
    Please add `@return` to explain what are returned.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/19083#discussion_r135985618

--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
@@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
}

object CodeGenerator extends Logging {
+
+ // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
+ // runtime and this is fixed in released JVMs, we hard-code this values here:
+ //
+ // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
+ // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
+ // Error: Could not create the Java Virtual Machine.
+ // Error: A fatal exception has occurred. Program will exit.
+ private val hugeMethodLimit = 8000
--- End diff --

Actually this threshold is only meaningful on HotSpot VM and some HotSpot-derived JVMs. Other JVMs, for instance IBM J9 doesn't use the same threshold. It'd be a bit too strict and biased to make this a non-configurable behavior for all JVMs.

I'd suggest that if we are to do this, at least centralize the JVM detection logic somewhere (e.g. unify with the JVM detection logic in `org.apache.spark.util.SizeEstimator`) and only set this kind of threshold based on the detected JVM. That way we're much less likely to regress on other JVMs, and/or even on future versions of the HotSpot VM where it'll get a new JIT compiler (Graal) with new JIT compilation heuristics that could be different from the current version.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82437/testReport)** for PR 19083 at commit [`fca22b7`](https://github.com/apache/spark/commit/fca22b767fddb061303cddd4e06c87130b1b32dc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82421/testReport)** for PR 19083 at commit [`b185e49`](https://github.com/apache/spark/commit/b185e4996a2ca67bd3928daa8f7b88f24faaeeff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81309/testReport)** for PR 19083 at commit [`73090e8`](https://github.com/apache/spark/commit/73090e8a23c128ed7cf794bc58c70454a0cc37e0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81507/testReport)** for PR 19083 at commit [`78653de`](https://github.com/apache/spark/commit/78653de5810558726721d4884345e195ce4979f0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142503474
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -32,7 +32,7 @@ import org.codehaus.commons.compiler.CompileException
     import org.codehaus.janino.{ByteArrayClassLoader, ClassBodyEvaluator, JaninoRuntimeException, SimpleCompiler}
     import org.codehaus.janino.util.ClassFile
     
    -import org.apache.spark.{SparkEnv, TaskContext, TaskKilledException}
    --- End diff --
    
    revert it back?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81313/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82348/testReport)** for PR 19083 at commit [`76d5cb2`](https://github.com/apache/spark/commit/76d5cb22bb96a46ac71b63893e3bad36fcf83a31).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    @viirya @rednaxelafx @kiszk Could you check again? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142012782
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -585,10 +586,22 @@ object SQLConf {
         .doc("The maximum lines of a single Java function generated by whole-stage codegen. " +
           "When the generated function exceeds this threshold, " +
           "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      "The default value 4000 is the max length of byte code JIT supported " +
    -      "for a single function(8000) divided by 2.")
    +      s"The default value ${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT / 2} is " +
    +      "the max length of byte code JIT supported for a single function" +
    +      s"(${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT}}) divided by 2.")
         .intConf
    -    .createWithDefault(4000)
    +    .createOptional
    +
    +  val CODEGEN_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
    +    .internal()
    +    .doc("The bytecode size of a single compiled Java function generated by whole-stage codegen." +
    +      "When the compiled function exceeds this threshold, " +
    +      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    --- End diff --
    
    This threshold is not for whole-stage only, right? It is misleading.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81314/testReport)** for PR 19083 at commit [`9c58237`](https://github.com/apache/spark/commit/9c58237fbbce32942674eb11cae6588e90df5a92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81498/testReport)** for PR 19083 at commit [`78653de`](https://github.com/apache/spark/commit/78653de5810558726721d4884345e195ce4979f0).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142053181
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    +        throw new CompileException(msg, null)
    --- End diff --
    
    Previously, it works although it does not perform well. Now, after this change, the query does not work. This could causes regressions. I think we should make a change to automatically fall back to the slow mode. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82375/testReport)** for PR 19083 at commit [`1fa7c1c`](https://github.com/apache/spark/commit/1fa7c1cac9e4711818df6957260acb751b6dd8b2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81498/testReport)** for PR 19083 at commit [`78653de`](https://github.com/apache/spark/commit/78653de5810558726721d4884345e195ce4979f0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r135992714
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    BTW, IBM J9 has the same threshold for too-long functions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058408
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    +        throw new CompileException(msg, null)
    --- End diff --
    
    yea, I'll do so. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81498/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82443/testReport)** for PR 19083 at commit [`09ae105`](https://github.com/apache/spark/commit/09ae105c101a1b31d2a8873976c01590c50411d2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    fixed @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81309/testReport)** for PR 19083 at commit [`73090e8`](https://github.com/apache/spark/commit/73090e8a23c128ed7cf794bc58c70454a0cc37e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81327/testReport)** for PR 19083 at commit [`d6add58`](https://github.com/apache/spark/commit/d6add582f3584ecaa9560b88bd72d9aa634bbf6e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81312/testReport)** for PR 19083 at commit [`78af6f4`](https://github.com/apache/spark/commit/78af6f4fcc2d4f7afebb2a3f2de0c16c562fcdb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142577111
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -333,33 +334,28 @@ class AggregateBenchmark extends BenchmarkBase {
           .sum()
           .collect()
     
    -    benchmark.addCase(s"codegen = F") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    +    benchmark.addCase(s"hugeMethodLimit = 8000") { iter =>
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "8000")
           f()
         }
     
    -    benchmark.addCase(s"codegen = T maxLinesPerFunction = 10000") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.maxLinesPerFunction", "10000")
    -      f()
    -    }
    -
    -    benchmark.addCase(s"codegen = T maxLinesPerFunction = 1500") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.maxLinesPerFunction", "1500")
    +    benchmark.addCase(s"hugeMethodLimit = 16000") { iter =>
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    +      sparkSession.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "16000")
           f()
         }
     
         benchmark.run()
     
         /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14 on Windows 7 6.1
    -    Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
    -    max function length of wholestagecodegen: Best/Avg Time(ms)    Rate(M/s)  Per Row(ns)  Relative
    -    ----------------------------------------------------------------------------------------------
    -    codegen = F                                    462 /  533          1.4       704.4     1.0X
    -    codegen = T maxLinesPerFunction = 10000       3444 / 3447          0.2      5255.3     0.1X
    -    codegen = T maxLinesPerFunction = 1500         447 /  478          1.5       682.1     1.0X
    +    Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 on Mac OS X 10.10.2
    +    Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    +
    +    max function bytecode size:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +    ------------------------------------------------------------------------------------------------
    +    hugeMethodLimit = 8000                        1043 / 1159          0.6        1591.5       1.0X
    --- End diff --
    
    yea, you're right; I'll update and thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    LGTM except one comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82435/testReport)** for PR 19083 at commit [`dfde49b`](https://github.com/apache/spark/commit/dfde49bcc487ecbc0135cd301e8d9c3ad17921be).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142020462
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2102,25 +2102,31 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
       }
     
       test("SPARK-15327: fail to compile generated code with complex data structure") {
    -    withTempDir{ dir =>
    -      val json =
    -        """
    -          |{"h": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": "testing", "count": 3}],
    -          |"b": [{"e": "test", "count": 1}]}}, "d": {"b": {"c": [{"e": "adfgd"}],
    -          |"a": [{"e": "testing", "count": 3}], "b": [{"e": "test", "count": 1}]}},
    -          |"c": {"b": {"c": [{"e": "adfgd"}], "a": [{"count": 3}],
    -          |"b": [{"e": "test", "count": 1}]}}, "a": {"b": {"c": [{"e": "adfgd"}],
    -          |"a": [{"count": 3}], "b": [{"e": "test", "count": 1}]}},
    -          |"e": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": "testing", "count": 3}],
    -          |"b": [{"e": "test", "count": 1}]}}, "g": {"b": {"c": [{"e": "adfgd"}],
    -          |"a": [{"e": "testing", "count": 3}], "b": [{"e": "test", "count": 1}]}},
    -          |"f": {"b": {"c": [{"e": "adfgd"}], "a": [{"e": "testing", "count": 3}],
    -          |"b": [{"e": "test", "count": 1}]}}, "b": {"b": {"c": [{"e": "adfgd"}],
    -          |"a": [{"count": 3}], "b": [{"e": "test", "count": 1}]}}}'
    -          |
    -        """.stripMargin
    -      spark.read.json(Seq(json).toDS()).write.mode("overwrite").parquet(dir.toString)
    -      spark.read.parquet(dir.toString).collect()
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly turn off whole-stage codegen here.
    +    // This guard can be removed if this issue fixed.
    +    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    The same here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82375/testReport)** for PR 19083 at commit [`1fa7c1c`](https://github.com/apache/spark/commit/1fa7c1cac9e4711818df6957260acb751b6dd8b2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81327/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82446/testReport)** for PR 19083 at commit [`09ae105`](https://github.com/apache/spark/commit/09ae105c101a1b31d2a8873976c01590c50411d2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82435/testReport)** for PR 19083 at commit [`dfde49b`](https://github.com/apache/spark/commit/dfde49bcc487ecbc0135cd301e8d9c3ad17921be).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136240918
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1058,7 +1068,17 @@ object CodeGenerator extends Logging {
     
         try {
           evaluator.cook("generated.java", code.body)
    -      recordCompilationStats(evaluator)
    +      val methodsToByteCodeSize = updateAndGetCompilationStats(evaluator)
    +      methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +        if (byteCodeSize > hugeMethodLimit) {
    +          val clazzName = evaluator.getClazz.getSimpleName
    +          val methodName = name.replace("$", "")
    +          throw new IllegalArgumentException(
    +            s"the size of $clazzName.$methodName is $byteCodeSize and this value goes over " +
    +              s"the HugeMethodLimit $hugeMethodLimit (JVM doesn't compile methods " +
    +              "larger than this limit)")
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82445/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82374/testReport)** for PR 19083 at commit [`08d31f3`](https://github.com/apache/spark/commit/08d31f3920edf32353298b5067e083c2643917db).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142582571
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -380,23 +380,26 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co
     
       override def doExecute(): RDD[InternalRow] = {
         val (ctx, cleanedSource) = doCodeGen()
    -    if (ctx.isTooLongGeneratedFunction) {
    -      logWarning("Found too long generated codes and JIT optimization might not work, " +
    -        "Whole-stage codegen disabled for this plan, " +
    -        "You can change the config spark.sql.codegen.MaxFunctionLength " +
    -        "to adjust the function length limit:\n "
    -        + s"$treeString")
    -      return child.execute()
    -    }
         // try to compile and fallback if it failed
    -    try {
    +    val (_, maxCodeSize) = try {
           CodeGenerator.compile(cleanedSource)
         } catch {
           case _: Exception if !Utils.isTesting && sqlContext.conf.codegenFallback =>
             // We should already saw the error message
             logWarning(s"Whole-stage codegen disabled for this plan:\n $treeString")
             return child.execute()
         }
    +
    +    // Check if compiled code has a too large function
    +    if (maxCodeSize > sqlContext.conf.hugeMethodLimit) {
    +      logWarning(s"Found too long generated codes and JIT optimization might not work: " +
    +        s"the bytecode size was $maxCodeSize, this value went over the limit " +
    +        s"${sqlContext.conf.hugeMethodLimit}, and the whole-stage codegen was disable " +
    +        s"for this plan. To avoid this, you can set the limit " +
    --- End diff --
    
    `set ` -> `raise `. then, remove `higher`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81312/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81245/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142582458
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -380,23 +380,26 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co
     
       override def doExecute(): RDD[InternalRow] = {
         val (ctx, cleanedSource) = doCodeGen()
    -    if (ctx.isTooLongGeneratedFunction) {
    -      logWarning("Found too long generated codes and JIT optimization might not work, " +
    -        "Whole-stage codegen disabled for this plan, " +
    -        "You can change the config spark.sql.codegen.MaxFunctionLength " +
    -        "to adjust the function length limit:\n "
    -        + s"$treeString")
    -      return child.execute()
    -    }
         // try to compile and fallback if it failed
    -    try {
    +    val (_, maxCodeSize) = try {
           CodeGenerator.compile(cleanedSource)
         } catch {
           case _: Exception if !Utils.isTesting && sqlContext.conf.codegenFallback =>
             // We should already saw the error message
             logWarning(s"Whole-stage codegen disabled for this plan:\n $treeString")
             return child.execute()
         }
    +
    +    // Check if compiled code has a too large function
    +    if (maxCodeSize > sqlContext.conf.hugeMethodLimit) {
    +      logWarning(s"Found too long generated codes and JIT optimization might not work: " +
    +        s"the bytecode size was $maxCodeSize, this value went over the limit " +
    +        s"${sqlContext.conf.hugeMethodLimit}, and the whole-stage codegen was disable " +
    --- End diff --
    
    `disable ` -> `disabled `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82442/testReport)** for PR 19083 at commit [`433f13b`](https://github.com/apache/spark/commit/433f13b03e995bbb47641b44ed1f7961cc4ea2ec).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82421/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82437/testReport)** for PR 19083 at commit [`fca22b7`](https://github.com/apache/spark/commit/fca22b767fddb061303cddd4e06c87130b1b32dc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058236
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    --- End diff --
    
    Is it useful to log the gen'd code for this case? The code gets compiled so it might not helpful to print out the code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142558540
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -380,23 +380,24 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co
     
       override def doExecute(): RDD[InternalRow] = {
         val (ctx, cleanedSource) = doCodeGen()
    -    if (ctx.isTooLongGeneratedFunction) {
    -      logWarning("Found too long generated codes and JIT optimization might not work, " +
    -        "Whole-stage codegen disabled for this plan, " +
    -        "You can change the config spark.sql.codegen.MaxFunctionLength " +
    -        "to adjust the function length limit:\n "
    -        + s"$treeString")
    -      return child.execute()
    -    }
         // try to compile and fallback if it failed
    -    try {
    +    val (_, maxCodeSize) = try {
           CodeGenerator.compile(cleanedSource)
         } catch {
           case _: Exception if !Utils.isTesting && sqlContext.conf.codegenFallback =>
             // We should already saw the error message
             logWarning(s"Whole-stage codegen disabled for this plan:\n $treeString")
             return child.execute()
         }
    +
    +    // Check if compiled code has a too large function
    +    if (maxCodeSize > sqlContext.conf.hugeMethodLimit) {
    +      logWarning(s"Found too long generated codes: the bytecode size was $maxCodeSize and " +
    --- End diff --
    
    We better explain why the too long codes is a problem as previous: `Found too long generated codes and JIT optimization might not work: ...`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81313/testReport)** for PR 19083 at commit [`6100734`](https://github.com/apache/spark/commit/6100734bb6a18bb293f9f257f3706acf0d99a5de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81314/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81313/testReport)** for PR 19083 at commit [`6100734`](https://github.com/apache/spark/commit/6100734bb6a18bb293f9f257f3706acf0d99a5de).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CodeGenerationSuite extends PlanTest with ExpressionEvalHelper `
      * `class OrderingSuite extends PlanTest with ExpressionEvalHelper `
      * `class GeneratedProjectionSuite extends PlanTest `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142558314
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -380,23 +380,24 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co
     
       override def doExecute(): RDD[InternalRow] = {
         val (ctx, cleanedSource) = doCodeGen()
    -    if (ctx.isTooLongGeneratedFunction) {
    -      logWarning("Found too long generated codes and JIT optimization might not work, " +
    -        "Whole-stage codegen disabled for this plan, " +
    -        "You can change the config spark.sql.codegen.MaxFunctionLength " +
    -        "to adjust the function length limit:\n "
    -        + s"$treeString")
    -      return child.execute()
    -    }
         // try to compile and fallback if it failed
    -    try {
    +    val (_, maxCodeSize) = try {
           CodeGenerator.compile(cleanedSource)
         } catch {
           case _: Exception if !Utils.isTesting && sqlContext.conf.codegenFallback =>
             // We should already saw the error message
             logWarning(s"Whole-stage codegen disabled for this plan:\n $treeString")
             return child.execute()
         }
    +
    +    // Check if compiled code has a too large function
    +    if (maxCodeSize > sqlContext.conf.hugeMethodLimit) {
    +      logWarning(s"Found too long generated codes: the bytecode size was $maxCodeSize and " +
    +        s"this value went over the limit ${sqlContext.conf.hugeMethodLimit}. To avoid this, " +
    +        s"you can the limit ${SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key} higher:\n$treeString")
    --- End diff --
    
    nit: `you can set the limit ... higher...`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142559152
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
    @@ -151,7 +151,7 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext {
         }
       }
     
    -  def genGroupByCodeGenContext(caseNum: Int): CodegenContext = {
    +  def genGroupByCodeGenContext(caseNum: Int): (CodegenContext, CodeAndComment) = {
    --- End diff --
    
    The returned `CodegenContext` is not used. We don't need to return it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82373/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136240660
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1058,7 +1068,17 @@ object CodeGenerator extends Logging {
     
         try {
           evaluator.cook("generated.java", code.body)
    -      recordCompilationStats(evaluator)
    +      val methodsToByteCodeSize = updateAndGetCompilationStats(evaluator)
    +      methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +        if (byteCodeSize > hugeMethodLimit) {
    +          val clazzName = evaluator.getClazz.getSimpleName
    +          val methodName = name.replace("$", "")
    +          throw new IllegalArgumentException(
    +            s"the size of $clazzName.$methodName is $byteCodeSize and this value goes over " +
    +              s"the HugeMethodLimit $hugeMethodLimit (JVM doesn't compile methods " +
    +              "larger than this limit)")
    --- End diff --
    
    We designed to show exception messages (see below the handling of `JaninoRuntimeException`), before falling back to non whole-stage codegen execution. I think we should follow the same behavior.
    
    This exception will be hidden from user for now in `WholeStageCodegenExec`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142558620
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -380,23 +380,24 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co
     
       override def doExecute(): RDD[InternalRow] = {
         val (ctx, cleanedSource) = doCodeGen()
    -    if (ctx.isTooLongGeneratedFunction) {
    -      logWarning("Found too long generated codes and JIT optimization might not work, " +
    -        "Whole-stage codegen disabled for this plan, " +
    -        "You can change the config spark.sql.codegen.MaxFunctionLength " +
    -        "to adjust the function length limit:\n "
    -        + s"$treeString")
    -      return child.execute()
    -    }
         // try to compile and fallback if it failed
    -    try {
    +    val (_, maxCodeSize) = try {
           CodeGenerator.compile(cleanedSource)
         } catch {
           case _: Exception if !Utils.isTesting && sqlContext.conf.codegenFallback =>
             // We should already saw the error message
             logWarning(s"Whole-stage codegen disabled for this plan:\n $treeString")
             return child.execute()
         }
    +
    +    // Check if compiled code has a too large function
    +    if (maxCodeSize > sqlContext.conf.hugeMethodLimit) {
    +      logWarning(s"Found too long generated codes: the bytecode size was $maxCodeSize and " +
    +        s"this value went over the limit ${sqlContext.conf.hugeMethodLimit}. To avoid this, " +
    --- End diff --
    
    `this value went over.... Whole-stage codegen disabled for this plan. To avoid this ...`. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81309/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82348/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82421/testReport)** for PR 19083 at commit [`b185e49`](https://github.com/apache/spark/commit/b185e4996a2ca67bd3928daa8f7b88f24faaeeff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142018586
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -585,10 +586,22 @@ object SQLConf {
         .doc("The maximum lines of a single Java function generated by whole-stage codegen. " +
           "When the generated function exceeds this threshold, " +
           "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      "The default value 4000 is the max length of byte code JIT supported " +
    -      "for a single function(8000) divided by 2.")
    +      s"The default value ${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT / 2} is " +
    +      "the max length of byte code JIT supported for a single function" +
    +      s"(${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT}}) divided by 2.")
         .intConf
    -    .createWithDefault(4000)
    +    .createOptional
    +
    +  val CODEGEN_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
    +    .internal()
    +    .doc("The bytecode size of a single compiled Java function generated by whole-stage codegen." +
    +      "When the compiled function exceeds this threshold, " +
    +      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    --- End diff --
    
    ya, you're right. I'll brush up.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142020549
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by.sql ---
    @@ -30,8 +30,15 @@ SELECT a + 2, COUNT(b) FROM testData GROUP BY a + 1;
     SELECT a + 1 + 1, COUNT(b) FROM testData GROUP BY a + 1;
     
     -- Aggregate with nulls.
    +--
    +-- In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +-- goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +-- in a test mode. So, we explicitly turn off whole-stage codegen here.
    +-- This guard can be removed if this issue fixed.
    +SET spark.sql.codegen.wholeStage=false;
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    @maropu Thanks for working on it. LGTM except two minor comments. 
    
    cc @rednaxelafx @kiszk @viirya @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82374/testReport)** for PR 19083 at commit [`08d31f3`](https://github.com/apache/spark/commit/08d31f3920edf32353298b5067e083c2643917db).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19083


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82353 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82353/testReport)** for PR 19083 at commit [`87140fb`](https://github.com/apache/spark/commit/87140fb76b8d24e2d7c100b43915db253905d191).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142018768
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -585,10 +586,22 @@ object SQLConf {
         .doc("The maximum lines of a single Java function generated by whole-stage codegen. " +
           "When the generated function exceeds this threshold, " +
           "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    -      "The default value 4000 is the max length of byte code JIT supported " +
    -      "for a single function(8000) divided by 2.")
    +      s"The default value ${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT / 2} is " +
    +      "the max length of byte code JIT supported for a single function" +
    +      s"(${CodeGenerator.DEFAULT_OPENJDK_JVM_HUGE_METHOD_LIMIT}}) divided by 2.")
         .intConf
    -    .createWithDefault(4000)
    +    .createOptional
    +
    +  val CODEGEN_HUGE_METHOD_LIMIT = buildConf("spark.sql.codegen.hugeMethodLimit")
    +    .internal()
    +    .doc("The bytecode size of a single compiled Java function generated by whole-stage codegen." +
    +      "When the compiled function exceeds this threshold, " +
    +      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
    --- End diff --
    
    updated


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r135996601
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    Let's ask @kiszk for that ^_^
    With [OpenJ9](http://openj9.mybluemix.net/) coming out soon, we might be able to find out details on this kind of issue ourselves. But until then it's easier to just ask IBM folks for a reliable answer ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82442/testReport)** for PR 19083 at commit [`433f13b`](https://github.com/apache/spark/commit/433f13b03e995bbb47641b44ed1f7961cc4ea2ec).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81241/testReport)** for PR 19083 at commit [`65eb028`](https://github.com/apache/spark/commit/65eb028cb501a7e3675710ec72ac773a1700b5f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136239358
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1079,7 +1099,7 @@ object CodeGenerator extends Logging {
       /**
        * Records the generated class and method bytecode sizes by inspecting janino private fields.
    --- End diff --
    
    Better to update method comment if we change method purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81507/testReport)** for PR 19083 at commit [`78653de`](https://github.com/apache/spark/commit/78653de5810558726721d4884345e195ce4979f0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82437/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81507/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142000625
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1074,13 +1078,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    +        throw new IllegalArgumentException(msg)
    --- End diff --
    
    `CompileException` is better?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058947
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    --- End diff --
    
    aha, ok. I'll drop.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136240500
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1079,7 +1099,7 @@ object CodeGenerator extends Logging {
       /**
        * Records the generated class and method bytecode sizes by inspecting janino private fields.
    --- End diff --
    
    oh, yea! Thanks! I'll update


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r137453624
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala ---
    @@ -98,19 +99,23 @@ class CodeGenerationSuite extends SparkFunSuite with ExpressionEvalHelper {
       }
     
       test("SPARK-18091: split large if expressions into blocks due to JVM code size limit") {
    -    var strExpr: Expression = Literal("abc")
    -    for (_ <- 1 to 150) {
    -      strExpr = Decode(Encode(strExpr, "utf-8"), "utf-8")
    -    }
    +    // Set the max value at `WHOLESTAGE_HUGE_METHOD_LIMIT` to compile gen'd code by janino
    +    withSQLConf(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key -> Int.MaxValue.toString) {
    --- End diff --
    
    yea, you're right. I think `CODEGEN_HUGE_METHOD_LIMIT` is better?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82446/testReport)** for PR 19083 at commit [`09ae105`](https://github.com/apache/spark/commit/09ae105c101a1b31d2a8873976c01590c50411d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82373/testReport)** for PR 19083 at commit [`8744566`](https://github.com/apache/spark/commit/874456648732309b35cefe36d45783eed77ee7b0).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142053683
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    +        throw new CompileException(msg, null)
    --- End diff --
    
    ok, I try to refactor this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82373/testReport)** for PR 19083 at commit [`8744566`](https://github.com/apache/spark/commit/874456648732309b35cefe36d45783eed77ee7b0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r141998362
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1074,13 +1078,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    +        throw new IllegalArgumentException(msg)
    --- End diff --
    
    Is this the best kind of exception?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142059351
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
    @@ -432,25 +432,31 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext {
       }
     
       test("zero moments") {
    -    val input = Seq((1, 2)).toDF("a", "b")
    -    checkAnswer(
    -      input.agg(stddev('a), stddev_samp('a), stddev_pop('a), variance('a),
    -        var_samp('a), var_pop('a), skewness('a), kurtosis('a)),
    -      Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
    -        Double.NaN, Double.NaN))
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly made this threshold higher here.
    +    // This workaround can be removed if this issue fixed.
    +    withSQLConf(SQLConf.CODEGEN_HUGE_METHOD_LIMIT.key -> "16000") {
    --- End diff --
    
    Do you think is it ok to silently fall back to non-codegen cases in tests? I though that the too-long function cases also should explicitly throws exceptions in tests by checking `!Utils.isTesting && sqlContext.conf.codegenFallback`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142577265
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
    @@ -151,7 +151,7 @@ class WholeStageCodegenSuite extends SparkPlanTest with SharedSQLContext {
         }
       }
     
    -  def genGroupByCodeGenContext(caseNum: Int): CodegenContext = {
    +  def genGroupByCodeGenContext(caseNum: Int): CodeAndComment = {
    --- End diff --
    
    `genGroupByCodeGenContext` -> `genGroupByCode`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81245 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81245/testReport)** for PR 19083 at commit [`6a61393`](https://github.com/apache/spark/commit/6a613936df188da4b0a2053b3799d1cd6392e676).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142020457
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala ---
    @@ -95,50 +96,62 @@ class DataFrameTimeWindowingSuite extends QueryTest with SharedSQLContext with B
       }
     
       test("sliding window grouping") {
    -    val df = Seq(
    -      ("2016-03-27 19:39:34", 1, "a"),
    -      ("2016-03-27 19:39:56", 2, "a"),
    -      ("2016-03-27 19:39:27", 4, "b")).toDF("time", "value", "id")
    -
    -    checkAnswer(
    -      df.groupBy(window($"time", "10 seconds", "3 seconds", "0 second"))
    -        .agg(count("*").as("counts"))
    -        .orderBy($"window.start".asc)
    -        .select($"window.start".cast("string"), $"window.end".cast("string"), $"counts"),
    -      // 2016-03-27 19:39:27 UTC -> 4 bins
    -      // 2016-03-27 19:39:34 UTC -> 3 bins
    -      // 2016-03-27 19:39:56 UTC -> 3 bins
    -      Seq(
    -        Row("2016-03-27 19:39:18", "2016-03-27 19:39:28", 1),
    -        Row("2016-03-27 19:39:21", "2016-03-27 19:39:31", 1),
    -        Row("2016-03-27 19:39:24", "2016-03-27 19:39:34", 1),
    -        Row("2016-03-27 19:39:27", "2016-03-27 19:39:37", 2),
    -        Row("2016-03-27 19:39:30", "2016-03-27 19:39:40", 1),
    -        Row("2016-03-27 19:39:33", "2016-03-27 19:39:43", 1),
    -        Row("2016-03-27 19:39:48", "2016-03-27 19:39:58", 1),
    -        Row("2016-03-27 19:39:51", "2016-03-27 19:40:01", 1),
    -        Row("2016-03-27 19:39:54", "2016-03-27 19:40:04", 1))
    -    )
    -  }
    -
    -  test("sliding window projection") {
    -    val df = Seq(
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly turn off whole-stage codegen here.
    +    // This guard can be removed if this issue fixed.
    +    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    +      val df = Seq(
             ("2016-03-27 19:39:34", 1, "a"),
             ("2016-03-27 19:39:56", 2, "a"),
             ("2016-03-27 19:39:27", 4, "b")).toDF("time", "value", "id")
    -      .select(window($"time", "10 seconds", "3 seconds", "0 second"), $"value")
    -      .orderBy($"window.start".asc, $"value".desc).select("value")
     
    -    val expands = df.queryExecution.optimizedPlan.find(_.isInstanceOf[Expand])
    -    assert(expands.nonEmpty, "Sliding windows require expand")
    +      checkAnswer(
    +        df.groupBy(window($"time", "10 seconds", "3 seconds", "0 second"))
    +          .agg(count("*").as("counts"))
    +          .orderBy($"window.start".asc)
    +          .select($"window.start".cast("string"), $"window.end".cast("string"), $"counts"),
    +        // 2016-03-27 19:39:27 UTC -> 4 bins
    +        // 2016-03-27 19:39:34 UTC -> 3 bins
    +        // 2016-03-27 19:39:56 UTC -> 3 bins
    +        Seq(
    +          Row("2016-03-27 19:39:18", "2016-03-27 19:39:28", 1),
    +          Row("2016-03-27 19:39:21", "2016-03-27 19:39:31", 1),
    +          Row("2016-03-27 19:39:24", "2016-03-27 19:39:34", 1),
    +          Row("2016-03-27 19:39:27", "2016-03-27 19:39:37", 2),
    +          Row("2016-03-27 19:39:30", "2016-03-27 19:39:40", 1),
    +          Row("2016-03-27 19:39:33", "2016-03-27 19:39:43", 1),
    +          Row("2016-03-27 19:39:48", "2016-03-27 19:39:58", 1),
    +          Row("2016-03-27 19:39:51", "2016-03-27 19:40:01", 1),
    +          Row("2016-03-27 19:39:54", "2016-03-27 19:40:04", 1))
    +      )
    +    }
    +  }
     
    -    checkAnswer(
    -      df,
    -      // 2016-03-27 19:39:27 UTC -> 4 bins
    -      // 2016-03-27 19:39:34 UTC -> 3 bins
    -      // 2016-03-27 19:39:56 UTC -> 3 bins
    -      Seq(Row(4), Row(4), Row(4), Row(4), Row(1), Row(1), Row(1), Row(2), Row(2), Row(2))
    -    )
    +  test("sliding window projection") {
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly turn off whole-stage codegen here.
    +    // This guard can be removed if this issue fixed.
    +    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    The same here. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058817
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    --- End diff --
    
    We already have `logDebug` at line 1072. We don't need to log it again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    ok to remove the option in this pr?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142020459
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala ---
    @@ -228,29 +241,35 @@ class DataFrameTimeWindowingSuite extends QueryTest with SharedSQLContext with B
       }
     
       test("millisecond precision sliding windows") {
    -    val df = Seq(
    -      ("2016-03-27 09:00:00.41", 3),
    -      ("2016-03-27 09:00:00.62", 6),
    -      ("2016-03-27 09:00:00.715", 8)).toDF("time", "value")
    -    checkAnswer(
    -      df.groupBy(window($"time", "200 milliseconds", "40 milliseconds", "0 milliseconds"))
    -        .agg(count("*").as("counts"))
    -        .orderBy($"window.start".asc)
    -        .select($"window.start".cast(StringType), $"window.end".cast(StringType), $"counts"),
    -      Seq(
    -        Row("2016-03-27 09:00:00.24", "2016-03-27 09:00:00.44", 1),
    -        Row("2016-03-27 09:00:00.28", "2016-03-27 09:00:00.48", 1),
    -        Row("2016-03-27 09:00:00.32", "2016-03-27 09:00:00.52", 1),
    -        Row("2016-03-27 09:00:00.36", "2016-03-27 09:00:00.56", 1),
    -        Row("2016-03-27 09:00:00.4", "2016-03-27 09:00:00.6", 1),
    -        Row("2016-03-27 09:00:00.44", "2016-03-27 09:00:00.64", 1),
    -        Row("2016-03-27 09:00:00.48", "2016-03-27 09:00:00.68", 1),
    -        Row("2016-03-27 09:00:00.52", "2016-03-27 09:00:00.72", 2),
    -        Row("2016-03-27 09:00:00.56", "2016-03-27 09:00:00.76", 2),
    -        Row("2016-03-27 09:00:00.6", "2016-03-27 09:00:00.8", 2),
    -        Row("2016-03-27 09:00:00.64", "2016-03-27 09:00:00.84", 1),
    -        Row("2016-03-27 09:00:00.68", "2016-03-27 09:00:00.88", 1))
    -    )
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly turn off whole-stage codegen here.
    +    // This guard can be removed if this issue fixed.
    +    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    The same here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82443/testReport)** for PR 19083 at commit [`09ae105`](https://github.com/apache/spark/commit/09ae105c101a1b31d2a8873976c01590c50411d2).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82364/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142059729
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
    @@ -432,25 +432,31 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext {
       }
     
       test("zero moments") {
    -    val input = Seq((1, 2)).toDF("a", "b")
    -    checkAnswer(
    -      input.agg(stddev('a), stddev_samp('a), stddev_pop('a), variance('a),
    -        var_samp('a), var_pop('a), skewness('a), kurtosis('a)),
    -      Row(Double.NaN, Double.NaN, 0.0, Double.NaN, Double.NaN, 0.0,
    -        Double.NaN, Double.NaN))
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly made this threshold higher here.
    +    // This workaround can be removed if this issue fixed.
    +    withSQLConf(SQLConf.CODEGEN_HUGE_METHOD_LIMIT.key -> "16000") {
    --- End diff --
    
    sorry, my bad. I'll do so. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142058372
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1092,13 +1082,30 @@ object CodeGenerator extends Logging {
             logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
             throw new CompileException(msg, e.getLocation)
         }
    +
    +    // Check if compiled code has a too large function
    +    methodsToByteCodeSize.foreach { case (name, byteCodeSize) =>
    +      if (byteCodeSize > SQLConf.get.hugeMethodLimit) {
    +        val clazzName = evaluator.getClazz.getSimpleName
    +        val methodName = name.replace("$", "")
    +        val msg = s"failed to compile: the size of $clazzName.$methodName was $byteCodeSize and " +
    +          "this value went over the limit `spark.sql.codegen.hugeMethodLimit`" +
    +          s"(${SQLConf.get.hugeMethodLimit}). To avoid this error, you can make this limit higher."
    +        logError(msg)
    +        val maxLines = SQLConf.get.loggingMaxLinesForCodegen
    +        logInfo(s"\n${CodeFormatter.format(code, maxLines)}")
    +        throw new CompileException(msg, null)
    --- End diff --
    
    We should turn back to non-codegen execution for too long function.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Thanks, I'll update soon


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81241/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82367/testReport)** for PR 19083 at commit [`9836e1d`](https://github.com/apache/spark/commit/9836e1d9ef7fbece88c4191cbfee8722908bc6e9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r135997033
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    oh, you know @kiszk, haha.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r135988601
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1091,6 +1111,7 @@ object CodeGenerator extends Logging {
         }
     
         // Then walk the classes to get at the method bytecode.
    +    val methodsToByteCodeSize = mutable.ArrayBuffer[(String, Int)]()
    --- End diff --
    
    Since we know the number of methods right below (for each generated class, `cf.methodInfos` would give us the number of methods in that class, of which we expect almost all of them to have a `Code` attribute), making pre-sized arrays for each class and then concatenating those arrays into a `Seq` as the result would avoid potential resizing wastes from using a big default-sized `ArrayBuffer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by rednaxelafx <gi...@git.apache.org>.

Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r136250152
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    Sure, I'm fine with having a `SQLConf` conf flag for the threshold for now, with the option to move to a centralized JVM detection implementation later where we can still have a `SQLConf` flag to override the heuristic based on detection.
    Setting such a threshold to 64KB will effectively turn the restriction off (i.e. same behavior as `-XX:-DontCompileHugeMethods`), so having a simple int conf should be good enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82353/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r135990600
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -1001,6 +1001,16 @@ abstract class CodeGenerator[InType <: AnyRef, OutType <: AnyRef] extends Loggin
     }
     
     object CodeGenerator extends Logging {
    +
    +  // This is the value of `HugeMethodLimit` in JVM settings. Since we can't get this value on
    +  // runtime and this is fixed in released JVMs, we hard-code this values here:
    +  //
    +  // $ java -XX:+UnlockDiagnosticVMOptions -XX:HugeMethodLimit=99999
    +  // Error: VM option 'HugeMethodLimit' is develop and is available only in debug version of VM.
    +  // Error: Could not create the Java Virtual Machine.
    +  // Error: A fatal exception has occurred. Program will exit.
    +  private val hugeMethodLimit = 8000
    --- End diff --
    
    Thanks for your comment! yea, I agree. I' rethink this pr based on your suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82374/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81245 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81245/testReport)** for PR 19083 at commit [`6a61393`](https://github.com/apache/spark/commit/6a613936df188da4b0a2053b3799d1cd6392e676).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82348/testReport)** for PR 19083 at commit [`76d5cb2`](https://github.com/apache/spark/commit/76d5cb22bb96a46ac71b63893e3bad36fcf83a31).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82375/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142020453
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala ---
    @@ -95,50 +96,62 @@ class DataFrameTimeWindowingSuite extends QueryTest with SharedSQLContext with B
       }
     
       test("sliding window grouping") {
    -    val df = Seq(
    -      ("2016-03-27 19:39:34", 1, "a"),
    -      ("2016-03-27 19:39:56", 2, "a"),
    -      ("2016-03-27 19:39:27", 4, "b")).toDF("time", "value", "id")
    -
    -    checkAnswer(
    -      df.groupBy(window($"time", "10 seconds", "3 seconds", "0 second"))
    -        .agg(count("*").as("counts"))
    -        .orderBy($"window.start".asc)
    -        .select($"window.start".cast("string"), $"window.end".cast("string"), $"counts"),
    -      // 2016-03-27 19:39:27 UTC -> 4 bins
    -      // 2016-03-27 19:39:34 UTC -> 3 bins
    -      // 2016-03-27 19:39:56 UTC -> 3 bins
    -      Seq(
    -        Row("2016-03-27 19:39:18", "2016-03-27 19:39:28", 1),
    -        Row("2016-03-27 19:39:21", "2016-03-27 19:39:31", 1),
    -        Row("2016-03-27 19:39:24", "2016-03-27 19:39:34", 1),
    -        Row("2016-03-27 19:39:27", "2016-03-27 19:39:37", 2),
    -        Row("2016-03-27 19:39:30", "2016-03-27 19:39:40", 1),
    -        Row("2016-03-27 19:39:33", "2016-03-27 19:39:43", 1),
    -        Row("2016-03-27 19:39:48", "2016-03-27 19:39:58", 1),
    -        Row("2016-03-27 19:39:51", "2016-03-27 19:40:01", 1),
    -        Row("2016-03-27 19:39:54", "2016-03-27 19:40:04", 1))
    -    )
    -  }
    -
    -  test("sliding window projection") {
    -    val df = Seq(
    +    // In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +    // goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +    // in a test mode. So, we explicitly turn off whole-stage codegen here.
    +    // This guard can be removed if this issue fixed.
    +    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    The same here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142020435
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by.sql ---
    @@ -30,8 +30,15 @@ SELECT a + 2, COUNT(b) FROM testData GROUP BY a + 1;
     SELECT a + 1 + 1, COUNT(b) FROM testData GROUP BY a + 1;
     
     -- Aggregate with nulls.
    +--
    +-- In SPARK-21871, we added code to check the actual bytecode size of gen'd methods. If the size
    +-- goes over `hugeMethodLimit`, Spark fails to compile the methods and the execution also fails
    +-- in a test mode. So, we explicitly turn off whole-stage codegen here.
    +-- This guard can be removed if this issue fixed.
    +SET spark.sql.codegen.wholeStage=false;
    --- End diff --
    
    Please change the value of `spark.sql.codegen.hugeMethodLimit`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #82367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82367/testReport)** for PR 19083 at commit [`9836e1d`](https://github.com/apache/spark/commit/9836e1d9ef7fbece88c4191cbfee8722908bc6e9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81327 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81327/testReport)** for PR 19083 at commit [`d6add58`](https://github.com/apache/spark/commit/d6add582f3584ecaa9560b88bd72d9aa634bbf6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19083: [SPARK-21871][SQL] Check actual bytecode size whe...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19083#discussion_r142577237
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
    @@ -17,10 +17,10 @@
     
     package org.apache.spark.sql.execution
     
    -import org.apache.spark.sql.{Column, Dataset, Row}
    -import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
    -import org.apache.spark.sql.catalyst.expressions.{Add, Literal, Stack}
    -import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import java.util.concurrent.ExecutionException
    +
    +import org.apache.spark.sql.Row
    +import org.apache.spark.sql.catalyst.expressions.codegen.{CodeAndComment, CodegenContext, CodeGenerator}
    --- End diff --
    
    We don't use `CodegenContext` anymore and can remove it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19083: [SPARK-21871][SQL] Check actual bytecode size when compi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19083
  
    **[Test build #81241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81241/testReport)** for PR 19083 at commit [`65eb028`](https://github.com/apache/spark/commit/65eb028cb501a7e3675710ec72ac773a1700b5f4).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org