You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/11 07:38:04 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

HeartSaVioR opened a new pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872
 
 
   ### What changes were proposed in this pull request?
   
   This patch provides a new SQL configuration (`spark.sql.codegen.useSwitchStatement`) to avoid using `switch` statement in generated code, which may open the possibility to hit the known Janino bug janino-compiler/janino#113.
   
   `switch` statement is being used by default, as it makes the generated code more concise, and also more efficient in point of performance's view. Once the end users' query fails due to the Janino bug, end users can turn off the new configuration to change the generated code to use `if ~ else if ~ else` and try running their query again. 
   
   The conditions are below:
   
   * The generated code contains 'switch' statement.
   * Exception message contains 'Operand stack inconsistent at offset xxx: Previous size 1, now 0'.
   
   ### Why are the changes needed?
   
   We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
   
   We provided the patch to Janino via janino-compiler/janino#114 and Janino 3.1.1 was released which contains the patch, but we figured out lots of unit tests fail once we apply Janino 3.1.1.
   
   We have asked about releasing Janino 3.0.16 to Janino maintainer, but given there's no guarantee to get it, we'd be better to have our own workaround.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT. Confirmed that changing `SQLConf.CODEGEN_USE_SWITCH_STATEMENT.key` to `"true"` made the new test fail.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597911868
 
 
   It would be appreciated if we can elaborate the possible performance regression (even it's like a "thinking out loud"); "may" and "some cases" are too ambiguous.
   
   If the query doesn't fall into the case via hitting Janino issue, it shouldn't bring perf. regression, as it tries to generate code with `switch` and compiles first. If there's perf. regression here, it says I have some unexpected mistake in fix which should be fixed.
   
   If the query fall into the case via hitting Janino issue, either the query has been failing or fail back to non-codegen. Taking alternative is definitely better than failing the query; the remaining one is whether using if ~ else if chain is worse than non-codegen.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597947452
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597921124
 
 
   I just enriched some information in PR title/description to make clear that workaround is applied as a fail-back. I'm sorry if anyone is confused because of lack of information.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599064274
 
 
   > btw, splitting large code into pieces in switch is a solution for this issue? Additionally, we need to replace switch with if?
   > I just want to know the actual performance numbers of this approach. I think splitting large code into small parts might improve performance.
   
   This patch is intended to add workaround for the bug until the actual patch will be placed into Janino; it would be nice if we could develop the ideas of "improvement" via separate thread. I'm not an expert of JVM so not clear how much JIT would help (if what we expect from making method lines smaller is just inlining the method, that would be technically same as before).
   
   That's why I've been also investigating to go forward with option 1 as well; this patch can be simply abandoned if we are OK with #27860 (for sure, with official release of Janino).
   
   > To reproduce the issue, could you build the simple query that you can show us based on your private customer's query? I think the query can make us understood more for the issue.
   
   I've added UT which fails in master branch. I've also added comments around UT to explain the details. I don't think it would require very complicated query; lots of columns and enough number of distinct aggregation function will trigger the bug.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-600990551
 
 
   @HeartSaVioR thank you so much for sorting it out! It's great to see the problems fixed on the Janino side.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597889358
 
 
   cc. @dongjoon-hyun and @maropu as well, as they showed reactions in #27860

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597545352
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119662/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597944907
 
 
   > do you know what's an user query to reproduce this issue?
   
   I have one, but I cannot share since the query is from actual customer. If you're OK with just generated code, I've attached the file in Janino issue https://github.com/janino-compiler/janino/issues/113.
   
   To share the information which directly impacts to hit the Janino bug, there're 70+ columns, grouped by 2 columns, and applies 60 aggregate functions (30 non-distinct, 30 distinct).
   
   > If this issue occurs frequently on the user side, it might be worth adding the workaround. its a corner case, I personally think its ok just to fall back into the interpreter mode for avoiding the maintenance overhead of the workaround.
   
   I'm not sure whether it'd be an edge-case happening very rarely; I'd like to hear more voices, but if the consensus would be treating this as rare case, I'd agree that we would like to avoid the bandaid fix. Let's see others' voices as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597886468
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24410/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597488692
 
 
   **[Test build #119656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119656/testReport)** for PR 27872 at commit [`90ef125`](https://github.com/apache/spark/commit/90ef1253bc92eaea35ed831355878a0f690781c1).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597497954
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597946658
 
 
   retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597489363
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24387/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597911868
 
 
   It would be appreciated if we can elaborate the possible performance regression (even it's like a "thinking out loud"); "may" and "some cases" are too ambiguous.
   
   If the query doesn't fall into the case via hitting Janino issue, it shouldn't bring perf. regression, as it tries to generate code with `switch` and compiles first. If there's perf. regression here, it says I have some unintended mistake in fix which should be fixed.
   
   If the query fall into the case via hitting Janino issue, either the query has been failing or has been failing back to non-codegen. Taking alternative is definitely better than failing the query; the remaining one is whether using if ~ else if chain (+ cost to regenerate code and compile) is worse than going through non-codegen.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-600990551
 
 
   @HeartSaVioR thank you so much for sorting it out! It's great to see the problems fixed on the Janino side.
   
   My preference on the version to upgrade to is on the conservative side. It'd be nice if we can take a baby step at this stage of Spark 3.0 release... so +1 on 3.0.16 from me for 3.0 (and maybe 2.4 too), and 3.1.2 on master.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598795015
 
 
   My 2 cents for 4 while we are waiting for 3.0.16. I am also neutral on other options except 2.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597560376
 
 
   **[Test build #119670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119670/testReport)** for PR 27872 at commit [`23ec81b`](https://github.com/apache/spark/commit/23ec81bb3856777eba7f08572036b691d303c33c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597885842
 
 
   **[Test build #119681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119681/testReport)** for PR 27872 at commit [`5aa7ece`](https://github.com/apache/spark/commit/5aa7ece30c5f44aacc9d994c0fc4a05d214b046c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#discussion_r391826035
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
 ##########
 @@ -957,4 +956,60 @@ class DataFrameAggregateSuite extends QueryTest
       assert(error.message.contains("function count_if requires boolean type"))
     }
   }
+
+  /**
+   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
+   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
+   * bug - https://github.com/janino-compiler/janino/issues/113.
+   *
+   * The expected exception message from Janino when we use switch statement for "ExpandExec":
+   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
+   * which will not happen when we use if-else-if statement for "ExpandExec".
+   *
+   * "The number of fields" and "The number of distinct aggregation functions" are the major
+   * factors to increase the size of generated code: while these values should be large enough
+   * to trigger the Janino bug, these values should not also too big; otherwise one of below
+   * exceptions might be thrown:
+   * - "expand_doConsume would be beyond 64KB"
+   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
+   */
+  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
+    withSQLConf(
+      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
+      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
+      (SQLConf.CODEGEN_FALLBACK.key, "false"),
+      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
+    ) {
+      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")
+
+      // The value is tested under commit "e807118eef9e0214170ff62c828524d237bd58e3":
+      // the query fails with switch statement, whereas it passes with if-else statement.
+      // Note that the value depends on the Spark logic as well - different Spark versions may
+      // require different value to ensure the test failing with switch statement.
+      val numNewFields = 100
+
+      df = df.withColumns(
+        (1 to numNewFields).map { idx => s"a$idx" },
+        (1 to numNewFields).map { idx =>
+          when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
+        }
+      )
+
+      val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
 
 Review comment:
   nit: How about `(1 to numNewFields)`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597947461
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24415/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599924340
 
 
   Thanks for the detailed explanation, @rednaxelafx !
   
   Btw, as I shared earlier, I have been investigating with Janino itself - I fixed all of bugs on 3.1.x which made Spark UT failing, and hopefully Janino maintainer made a branch for 3.0.x as well. Maybe we can get both of 3.0.16 & 3.1.2 versions in this week, and then we can simply upgrade Janino to abandon this PR.
   
   I'm running UT in #27860 for Janino 3.1.2 (custom jar via Jitpack) as well as #27932 for Janino 3.0.16 (custom jar via Jitpack). Once both builds run fine, I'll forward the result to Janino maintainer.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597607318
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597560376
 
 
   **[Test build #119670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119670/testReport)** for PR 27872 at commit [`23ec81b`](https://github.com/apache/spark/commit/23ec81bb3856777eba7f08572036b691d303c33c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599026718
 
 
   Honestly I'm not sure I understand the proposal from @rednaxelafx - what I can imagine to only touch ExpandExec to get around is option 4 from my proposal, and if he meant other trick/technique than I don't get it. It would be nice if someone could help me understand via elaborating a bit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597904953
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24411/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597489357
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598037879
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119693/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597497495
 
 
   **[Test build #119662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119662/testReport)** for PR 27872 at commit [`23ec81b`](https://github.com/apache/spark/commit/23ec81bb3856777eba7f08572036b691d303c33c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599026718
 
 
   Honestly I'm not sure about the details I got the proposal from @rednaxelafx - what I can imagine for only touching ExpandExec to get around is option 4 from my proposal, and if he meant other trick/technique than I don't get it. It would be nice if someone could help me understand via elaborating a bit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-601091320
 
 
   Hmm, that's a hard choice. Janino's bultin tests are far from having good coverage, having hacked my own fork of Janino once and know the code fairly well, I'm feel somewhat uneasy about the huge refactorings...
   
   Sticking to the "deprecated" Janino 3.0.x branch for Spark 3.0.x doesn't sound like that bad of a situation to me. Spark 3.0 took a long time, but Spark 3.1 shouldn't be that far away.
   
   That said, I wouldn't be too sad if we end up taking Janino 3.1.2 into Spark 3.0.0.
   We'll just release Spark patch version with a new Janino release if we hit a Janino bug...right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597545345
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597904946
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597545158
 
 
   **[Test build #119662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119662/testReport)** for PR 27872 at commit [`23ec81b`](https://github.com/apache/spark/commit/23ec81bb3856777eba7f08572036b691d303c33c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599064274
 
 
   > btw, splitting large code into pieces in switch is a solution for this issue? Additionally, we need to replace switch with if?
   > I just want to know the actual performance numbers of this approach. I think splitting large code into small parts might improve performance.
   
   This patch is intended to add workaround for the bug until the actual patch will be placed into Janino; it would be nice if we could develop the ideas of "improvement" via separate thread. I'm not an expert of JVM so not clear how much JIT would help (if what we expect from making method lines smaller is just inlining the method, that would be technically same as before).
   
   That's why I've been also investigating to go forward with option 1 as well; this patch can be simply abandoned if we are OK with #27860 (for sure, after official release of Janino).
   
   > To reproduce the issue, could you build the simple query that you can show us based on your private customer's query? I think the query can make us understood more for the issue.
   
   I've added UT which fails in master branch. I've also added comments around UT to explain the details. I don't think it would require very complicated query; lots of columns and enough number of distinct aggregation function will trigger the bug.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597886463
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] squito commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
squito commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#discussion_r391133288
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ##########
 @@ -454,7 +454,9 @@ case class InSet(child: Expression, hset: Set[Any]) extends UnaryExpression with
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-    if (canBeComputedUsingSwitch && hset.size <= SQLConf.get.optimizerInSetSwitchThreshold) {
+    val sqlConf = SQLConf.get
+    if (canBeComputedUsingSwitch && hset.size <= sqlConf.optimizerInSetSwitchThreshold &&
+      sqlConf.codegenUseSwitchStatement) {
 
 Review comment:
   instead of an added configuration, could this be a `try / catch`?  first try with switch, if it fails, then log & error and use if-else?  avoids the users having to set a config, and then unset it again when we have the right fix from janino

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597607318
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599017718
 
 
   Thanks for doing this fix!
   
   Among the options, I'd prefer to tune ExpandExec like @rednaxelafx proposed to work around this issue for short term. Then we can either wait for a future release of 3.0.16, or Janino 3.1.2 that fixes all test failures in Spark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599026718
 
 
   Honestly I'm not sure I understand the proposal from @rednaxelafx - what I can imagine is only touching ExpandExec to get around is option 4 from my proposal, and if he meant other trick/technique than I don't get it. It would be nice if someone could help me understand via elaborating a bit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598037689
 
 
   **[Test build #119693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119693/testReport)** for PR 27872 at commit [`4bd12d8`](https://github.com/apache/spark/commit/4bd12d87c0789ec8a3932078e1241bfe49e599a9).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#discussion_r391343191
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ##########
 @@ -688,11 +692,56 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
     child.executeColumnar()
   }
 
-  override def doExecute(): RDD[InternalRow] = {
+  private type CompileResult = (CodegenContext, CodeAndComment, GeneratedClass, ByteCodeStats)
+
+  /**
+   * NOTE: This method handles the known Janino bug:
+   * - https://github.com/janino-compiler/janino/issues/113
+   *
+   * It tries to generate code and compile in normal path. If the compilation fails and the reason
+   * is due to the known bug, it generates workaround code via touching flag in CodegenContext and
+   * compile again.
+   */
+  private def doGenCodeAndCompile(): CompileResult = {
 
 Review comment:
   `switch` is now used specific to ExpandExec and InSet; originally what I tracked was only ExpandExec, which doesn't fall into the case if I understand correctly. Maybe InSet has upper/lower limit configuration which wouldn't trigger the issue - just apply to ExpandExec only?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597989123
 
 
   **[Test build #119693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119693/testReport)** for PR 27872 at commit [`4bd12d8`](https://github.com/apache/spark/commit/4bd12d87c0789ec8a3932078e1241bfe49e599a9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#discussion_r391333502
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ##########
 @@ -688,11 +692,56 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
     child.executeColumnar()
   }
 
-  override def doExecute(): RDD[InternalRow] = {
+  private type CompileResult = (CodegenContext, CodeAndComment, GeneratedClass, ByteCodeStats)
+
+  /**
+   * NOTE: This method handles the known Janino bug:
+   * - https://github.com/janino-compiler/janino/issues/113
+   *
+   * It tries to generate code and compile in normal path. If the compilation fails and the reason
+   * is due to the known bug, it generates workaround code via touching flag in CodegenContext and
+   * compile again.
+   */
+  private def doGenCodeAndCompile(): CompileResult = {
 
 Review comment:
   How do we handle the non-whole stage codegen case? e.g., `GenerateMutableProjection` via `CodeGeneratorWithInterpretedFallback`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-601170184
 
 
   semver still applies in that behavior changes - and the shaded copy - can affects apps. But there's no issue w.r.t. semver in updating to a new minor release, especially in a major Spark release.
   Of course, we have to evaluate how much the dependency follows semver.
   
   But yes the issues are: how risky is the update in general? breakage is bad in any Spark release. How much does it help? if the current version isn't maintained, when do you have to update and take that risk, and is that better on a major vs minor release boundary?
   
   That is, if there is any perceived risk, would you take it now or in a _minor_ Spark release.
   I don't feel strongly but I think I'd try for 3.1.x in Spark 3.0 if there is no plausible reason to expect a risk of breakage that we know of.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#discussion_r390810675
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
 ##########
 @@ -957,4 +956,61 @@ class DataFrameAggregateSuite extends QueryTest
       assert(error.message.contains("function count_if requires boolean type"))
     }
   }
+
+  /**
+   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
+   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
+   * bug - https://github.com/janino-compiler/janino/issues/113.
+   *
+   * The expected exception message from Janino when we use switch statement for "ExpandExec":
+   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
+   * which will not happen when we use if-else-if statement for "ExpandExec".
+   *
+   * "The number of fields" and "The number of distinct aggregation functions" are the major
+   * factors to increase the size of generated code: while these values should be large enough
+   * to trigger the Janino bug, these values should not also too big; otherwise one of below
+   * exceptions might be thrown:
+   * - "expand_doConsume would be beyond 64KB"
+   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
+   */
+  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
+    withSQLConf(
+      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
+      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
+      (SQLConf.CODEGEN_FALLBACK.key, "false"),
+      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1"),
+      (SQLConf.CODEGEN_USE_SWITCH_STATEMENT.key, "false")
+    ) {
+      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")
+
+      // The value is tested under commit "e807118eef9e0214170ff62c828524d237bd58e3":
+      // the query fails with switch statement, whereas it passes with if-else statement.
+      // Note that the value depends on the Spark logic as well - different Spark versions may
+      // require different value to ensure the test failing with switch statement.
+      val numNewFields = 100
 
 Review comment:
   Originally I was crafting the patch against Spark 2.3 - in Spark 2.3, setting this to 100 throws exception which is not from Janino bug, but from either hitting 64KB limit or parameter limitation on method signature. (That's why I added the details on exceptions when the value exceeds upper limit.)
   
   For Spark 2.3, `70` is the thing making `switch statement` failing and `if ~ else if ~ else statement` passing.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597905015
 
 
   It may cause performance regression, @HeartSaVioR .
   
   cc @kiszk and @rednaxelafx and @gatorsmile , too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-600379582
 
 
   Both builds run fine - I've shared the result to Janino maintainer. Looks like he is open to release 3.0.16, but most likely one-time release and 3.0.x would never be LTS. So that would be ideal if we feel OK to go with Janino 3.1.2, but if we concern about stability we could just go with 3.0.16.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597488692
 
 
   **[Test build #119656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119656/testReport)** for PR 27872 at commit [`90ef125`](https://github.com/apache/spark/commit/90ef1253bc92eaea35ed831355878a0f690781c1).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597905015
 
 
   It may cause performance regression in some cases, @HeartSaVioR .
   
   cc @kiszk and @rednaxelafx and @gatorsmile , too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597497495
 
 
   **[Test build #119662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119662/testReport)** for PR 27872 at commit [`23ec81b`](https://github.com/apache/spark/commit/23ec81bb3856777eba7f08572036b691d303c33c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597885842
 
 
   **[Test build #119681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119681/testReport)** for PR 27872 at commit [`5aa7ece`](https://github.com/apache/spark/commit/5aa7ece30c5f44aacc9d994c0fc4a05d214b046c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597694954
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119670/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-601056157
 
 
   But, if we use janino v3.0.16  for spark v3.0.0, future janino 3.1.x releases for bugfixes have no luck for our 3.0.x maintance releases?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597694943
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598492806
 
 
   Thanks for spending your time to analyze the patch and provide and detailed voice, @rednaxelafx .
   
   The rationalization of this patch is to address known issue "before" we reach the real solution, as the real solution is out of our control; no one can answer when Janino 3.0.16 will be released, and even whether it will be released or not. Janino 3.1 refactors everything by oneself and makes things be unstable - recent version of Janino makes lots of Spark UTs fail which makes me less confident we can migrate to Janino 3.1.x.
   
   I totally agree the patch is a band-aid fix and not well designed to extend (I didn't intend it though), and there're better options available (more to less preferable personally)
   
   1. Participate actively to Janino - fix all the issues in 3.1.x which breaks Spark, and adopt Janino 3.1.x. It should be ideal if we can provide set of regression tests so that Janino community can keep the stability. For sure, I'm not an expert of JVM so I can't lead the effort. A couple of JVM experts should jump in and lead the effort.
   
   2. Fork Janino - create a main dev. branch from v3.0.15, apply the patch for janino-compiler/janino#114 (and port back some bugfixes after 3.0.15), release v3.0.16 by our side. License seems to be 3-clause BSD license, which doesn't seem to restrict redistribution. The thing of this option would be who/which group is willing to maintain the fork and release under their name.
   
   3. This patch. (try to compile and fail-back)
   
   4. Modify ExpandExec to check the number of operations in for statement, and use `if ~ else if` when the number of operations exceed the threshold. This should be ideally checking the length of offset but it would be weird if Spark does it, so count the lines blindly. Performance regression may happen in some cases where it can run with `switch` but due to blind count it runs with `if ~ else if`, but the case wouldn't be common. 
   
   5. Give up addressing issue - guide that end users should enable failback option and run with interactive mode.
   
   WDYT? Would it be better to raise this to discussion in dev@ list?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599026718
 
 
   Honestly I'm not sure I got the proposal from @rednaxelafx - what I can imagine for only touching ExpandExec to get around is option 4 from my proposal, and if he meant other trick/technique than I don't get it. It would be nice if someone could help me understand via elaborating a bit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
kiszk removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598790958
 
 
   As @maropu asked, is this query frequently executed? Or, does it seem to be corner case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597559095
 
 
   Retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#discussion_r391275141
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ##########
 @@ -454,7 +454,9 @@ case class InSet(child: Expression, hset: Set[Any]) extends UnaryExpression with
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-    if (canBeComputedUsingSwitch && hset.size <= SQLConf.get.optimizerInSetSwitchThreshold) {
+    val sqlConf = SQLConf.get
+    if (canBeComputedUsingSwitch && hset.size <= sqlConf.optimizerInSetSwitchThreshold &&
+      sqlConf.codegenUseSwitchStatement) {
 
 Review comment:
   Yeah it would be ideal. Thanks! I over-thought a bit and thought it would be hard to deal with it, but looks like not that complicated. Let me update the code and PR information.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599809129
 
 
   My suggestion for tuning codegen may have been too cryptic, sorry about not providing more background explanation.
   
   Let me explain why I arrived at the `Literal` codegen improvement suggestion (targeted for Spark 2.3.x branch -- anything starting from Spark 2.4.0 has it already). @viirya probably saw through this already ;-)
   
   My assumptions:
   1. The problematic code from actual customer was generated from Spark 2.3.x that @HeartSaVioR is trying to help fix. So to unblock this specific customer, slightly improving Spark 2.3.x branch may help.
   2. This PR targets master because the same Janino issue does still show up in master as well.
   3. But the specific customer isn't using Spark 2.4 yet (nor Spark 3.0 since it isn't even released yet).
   
   From what @HeartSaVioR was able to share in https://github.com/janino-compiler/janino/issues/113, specifically in the attachment https://github.com/janino-compiler/janino/files/4295191/error-codegen.log, we can see that a large portion of the switch...case body is filled with code like the following:
   ```java
   /* 4677 */       case 0:
   /* 4678 */         final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = expand_value129;
   /* 4681 */
   /* 4682 */         final long expand_value130 = -1L;
   /* 4683 */         expand_isNull67 = true;
   /* 4684 */         expand_value67 = expand_value130;
   /* 4685 */
   /* 4686 */         final long expand_value131 = -1L;
   /* 4687 */         expand_isNull68 = true;
   /* 4688 */         expand_value68 = expand_value131;
   /* 4689 */
   /* 4690 */         final long expand_value132 = -1L;
   /* 4691 */         expand_isNull69 = true;
   /* 4692 */         expand_value69 = expand_value132;
   ```
   
   If we zoom in on the 3 statements here:
   ```java
   /* 4678 */         final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = expand_value129;
   ```
   `javac` would have followed JLS and compiled it into two assignments (L4679 and L4680 with constant propagation), and skipped L4678 because it's a Java language level constant expression initialization.
   In pseudo bytecode, that's:
   ```
   load constant long 1  # constant true
   store to local expand_isNull66
   load constant int -1L
   store to local expand_value66
   ```
   
   Janino, on the other hand, doesn't handle all of Java language level constant expression constructs. Specifically, it does not support local constant declarations. So it compiles the code as 3 assignments:
   ```
   load constant long -1L
   store to local expand_value129
   load constant int 1  # constant true
   store to local expand_isNull66
   load from local expand_value129
   store to local expand_value66
   ```
   
   So every "chunk" of code above would be a few more bytes of bytecode. It may be small for just one chunk, but the codegen for `ExpandExec` makes that problem really bad by generating a lot of them.
   
   So where does this "value = -1L; isNull = true" thingy come from? We can infer that it's generated from `Literal(null, LongType)`. That is, https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L282-L285
   
   We can easily make Spark 2.3.x generate slightly better code, by improving `Literal`'s codegen, replacing
   ```scala
       if (value == null) {
         ev.isNull = "true"
         ev.copy(s"final $javaType ${ev.value} = ${ctx.defaultValue(dataType)};")
       }
   ```
   
   with something like:
   ```scala
       if (value == null) {
         ev.copy(
           isNull = "true",
           value = ctx.defaultValue(dataType)
         )
       }
   ```
   (the `value` part may or may not have to be further adjusted. Perhaps like `value = s"(($javaType)${ctx.defaultValue(dataType)})"`) for maximal safety.
   
   which would improve the example above into:
   ```java
   /* 4678 */         // THIS IS GONE: final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = -1L;
   ```
   so that Janino can generate less bytecode for this particular query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598037879
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119693/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597560830
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597904946
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597989452
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24422/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597560850
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24399/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597693873
 
 
   **[Test build #119670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119670/testReport)** for PR 27872 at commit [`23ec81b`](https://github.com/apache/spark/commit/23ec81bb3856777eba7f08572036b691d303c33c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598792252
 
 
   Sounds good, though yeah ideally we get a 3.0.16 update instead to fix it. Maybe worth waiting a beat but not holding up 3.0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597944907
 
 
   > do you know what's an user query to reproduce this issue?
   
   I have one, but I cannot share since the query is from actual customer. If you're OK with just generated code, I've attached the file in Janino issue https://github.com/janino-compiler/janino/issues/113.
   
   To share the information which is directly impacted to hit the Janino bug, there're 70+ columns, grouped by 2 columns, and applies 60 aggregate functions (30 non-distinct, 30 distinct).
   
   > If this issue occurs frequently on the user side, it might be worth adding the workaround. its a corner case, I personally think its ok just to fall back into the interpreter mode for avoiding the maintenance overhead of the workaround.
   
   I'm not sure whether it'd be an edge-case happening very rarely; I'd like to hear more voices, but if the consensus would be treating this as rare case, I'd agree that we would like to avoid the bandaid fix. Let's see others' voices as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598492806
 
 
   Thanks for spending your time to analyze the patch and provide and detailed voice, @rednaxelafx .
   
   The rationalization of this patch is to address known issue "before" we reach the real solution, as the real solution is out of our control; no one can answer when Janino 3.0.16 will be released, and even whether it will be released or not. Janino maintainer refactors everything by oneself in 3.1.x and makes things be unstable - recent version of Janino makes lots of Spark UTs fail which makes me less confident we can migrate to Janino 3.1.x.
   
   I totally agree the patch is a band-aid fix and not well designed to extend (I didn't intend it be extendable though), and there're better options available (more to less preferable personally)
   
   1. Participate actively to Janino - fix all the issues in 3.1.x which breaks Spark, and adopt Janino 3.1.x. It should be ideal if we can provide set of regression tests so that Janino community can keep the stability. For sure, I'm not an expert of JVM so I can't lead the effort. A couple of JVM experts should jump in and lead the effort.
   
   2. Fork Janino - create a main dev. branch from v3.0.15, apply the patch for janino-compiler/janino#114 (and port back some bugfixes after 3.0.15), release v3.0.16 by our side. License seems to be 3-clause BSD license, which doesn't seem to restrict redistribution. The thing of this option would be who/which group is willing to maintain the fork and release under their name.
   
   3. This patch. (try to compile and fail-back)
   
   4. Modify ExpandExec to check the number of operations in for statement, and use `if ~ else if` when the number of operations exceed the threshold. This should be ideally checking the length of offset but it would be weird if Spark does it, so count the lines blindly. Performance regression may happen in some cases where it can run with `switch` but due to blind count it runs with `if ~ else if`, but the case wouldn't be common. 
   
   5. Give up addressing issue - either let end users to tune their query or guide that end users should enable failback option and run with interactive mode.
   
   WDYT? Would it be better to raise this to discussion in dev@ list?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599809129
 
 
   My suggestion for tuning codegen may have been too cryptic, sorry about not providing more background explanation.
   
   Let me explain why I arrived at the `Literal` codegen improvement suggestion (targeted for Spark 2.3.x branch -- anything starting from Spark 2.4.0 has it already). @viirya probably saw through this already ;-)
   
   My assumptions:
   1. The problematic code from actual customer was generated from Spark 2.3.x that @HeartSaVioR is trying to help fix. So to unblock this specific customer, slightly improving Spark 2.3.x branch may help.
   2. This PR targets master because the same Janino issue does still show up in master as well.
   3. But the specific customer isn't using Spark 2.4 yet (nor Spark 3.0 since it isn't even released yet).
   
   From what @HeartSaVioR was able to share in https://github.com/janino-compiler/janino/issues/113, specifically in the attachment https://github.com/janino-compiler/janino/files/4295191/error-codegen.log, we can see that a large portion of the switch...case body is filled with code like the following:
   ```java
   /* 4677 */       case 0:
   /* 4678 */         final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = expand_value129;
   /* 4681 */
   /* 4682 */         final long expand_value130 = -1L;
   /* 4683 */         expand_isNull67 = true;
   /* 4684 */         expand_value67 = expand_value130;
   /* 4685 */
   /* 4686 */         final long expand_value131 = -1L;
   /* 4687 */         expand_isNull68 = true;
   /* 4688 */         expand_value68 = expand_value131;
   /* 4689 */
   /* 4690 */         final long expand_value132 = -1L;
   /* 4691 */         expand_isNull69 = true;
   /* 4692 */         expand_value69 = expand_value132;
   ```
   
   If we zoom in on the 3 statements here:
   ```java
   /* 4678 */         final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = expand_value129;
   ```
   `javac` would have followed JLS and compiled it into two assignments (L4679 and L4680 with constant propagation), and skipped L4678 because it's a Java language level constant expression initialization.
   In pseudo bytecode, that's:
   ```
   load constant long 1  # constant true
   store to local expand_isNull66
   load constant int -1L
   store to local expand_value66
   ```
   
   Janino, on the other hand, doesn't handle all of Java language level constant expression constructs. Specifically, it does not support local constant declarations. So it compiles the code as 3 assignments:
   ```
   load constant long -1L
   store to local expand_value129
   load constant int 1  # constant true
   store to local expand_isNull66
   load from local expand_value129
   store to local expand_value66
   ```
   
   So every "chunk" of code above would be a few more bytes of bytecode. It may be small for just one chunk, but the codegen for `ExpandExec` makes that problem really bad by generating a lot of them.
   
   So where does this "value = -1L; isNull = true" thingy come from? We can infer that it's generated from `Literal(null, LongType)`. That is, https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L282-L285
   
   We can easily make Spark 2.3.x generate slightly better code, by improving `Literal`'s codegen, replacing
   ```scala
       if (value == null) {
         ev.isNull = "true"
         ev.copy(s"final $javaType ${ev.value} = ${ctx.defaultValue(dataType)};")
       }
   ```
   
   with something like:
   ```scala
       if (value == null) {
         ev.copy(
           isNull = "true",
           value = ctx.defaultValue(dataType)
         )
       }
   ```
   
   which would improve the example above into:
   ```java
   /* 4678 */         // THIS IS GONE: final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = -1L; // the extra casting is handled by Janino as no-op
   ```
   so that Janino can generate less bytecode for this particular query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597904355
 
 
   **[Test build #119682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119682/testReport)** for PR 27872 at commit [`c4375de`](https://github.com/apache/spark/commit/c4375deeca993675ece0367647fb8dd4f0a37410).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598037864
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597973377
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599014212
 
 
   UPDATE: I’ve managed to fix all of Spark test failures we’ve seen with Janino 3.1.1 - see #27860. (Say, dealt with option 1) We still need to wait for these patches to be merged and released (so this patch is not still outdated) but in better situation as both 3.1.2 and 3.0.16 will work, worth to wait a bit more to see how things will go with Janino side.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598063427
 
 
   I have mixed feelings about this PR.
   
   The good part:
   
   It does do what it advertised, and should not result in performance regressions since the code that would have passed compilation would stay as-is, and only the code that triggers error would go to the fallback mode which does come with slightly bigger code size, but wouldn't really affect runtime performance *if the generated code is still JIT compiled by an optimizing JIT*.
   
   The less-good part:
   
   I feel like the codegen system may make good use of a good retry framework, so that it _may be able to_ do things that it couldn't do right now.
   e.g. code splitting comes at a cost of extra function invocation overhead. If we generated code without code split and still stayed within either 8000 bytes or 64KB (depending on which threshold you care about more), then it'd be better not to split. Right now Spark SQL's codegen just does that unconditionally using "length of code text" as the trigger; if a retry framework were in place, we could have tried to generate without splitting first, and only go conservative if the thresholds are crossed.
   
   The code presented here is an attempt at a one-off case of retry, and I'm somewhat concerned about future PRs that try to pile on top of it to implement retry for other scenarios, and just let the code "evolve organically" without a good base design first.
   
   The the main focus is to buy some time before we can get Janino to release 3.0.16, my hunch is it's possible to slightly tune the codegen for ExpandExec to make it generate just small enough code to work around the customer issue bugging you right now.
   
   e.g. if a part of the problem was caused by the case body of the `switch` statement being too big, and if your customer was only hitting that in Spark 2.3.x but not in Spark 2.4.x / 3.0.x, then using this style of generating the code for `Literal(null, dataType)` may help, at least with the example code you showed in the Janino bug you reported:
   ```scala
   ExprCode(code = EmptyBlock, isNull = TrueLiteral, JavaCode.defaultLiteral(dataType))
   ```
   (from Spark 2.4 / 3.0 / master: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L69)
   
   WDYT @HeartSaVioR ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597489357
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598790958
 
 
   As @maropu asked, is this query frequently executed? Or, does it seem to be corner case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597497960
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24393/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597911868
 
 
   It would be appreciated if we can elaborate the possible performance regression (even it's like a "thinking out loud"); "may" and "some cases" are too ambiguous.
   
   If the query doesn't fall into the case via hitting Janino issue, it shouldn't bring perf. regression, as it tries to generate code with `switch` and compiles first. If there's perf. regression here, it says I have some unintended mistake in fix which should be fixed.
   
   If the query fall into the case via hitting Janino issue, either the query has been failing or has been failing back to non-codegen. Taking alternative is definitely better than failing the query; the remaining one is whether using if ~ else if chain is worse than non-codegen.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597497954
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598497471
 
 
   Btw, I'm happy to hear if the approach of this patch (retry) initiates to imagine the new idea of improvement. That would be enough worthy.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597545352
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119662/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-601169204
 
 
   > Janino's bultin tests are far from having good coverage
   
   I'd tend to agree, though I'd also understand about Janino's status - only one maintainer, hard to guess the all usages. Spark generated code is far far from hand written code which no one would have been imagine about ;)
   
   Maybe it would be ideal to construct the regression tests in Spark side - either having set of generated codes, or E2E tests.
   
   > We'll just release Spark patch version with a new Janino release if we hit a Janino bug...right?
   
   I'll try to sort it out once it happens but 3rd party is a kind of outer control. 3.0.17 might be unlikely happening, whereas 3.1.3 is likely, easier to persuade having new release.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597886468
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24410/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597989450
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597694954
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119670/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597973377
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597946950
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR closed pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#discussion_r391981993
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
 ##########
 @@ -957,4 +956,60 @@ class DataFrameAggregateSuite extends QueryTest
       assert(error.message.contains("function count_if requires boolean type"))
     }
   }
+
+  /**
+   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
+   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
+   * bug - https://github.com/janino-compiler/janino/issues/113.
+   *
+   * The expected exception message from Janino when we use switch statement for "ExpandExec":
+   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
+   * which will not happen when we use if-else-if statement for "ExpandExec".
+   *
+   * "The number of fields" and "The number of distinct aggregation functions" are the major
+   * factors to increase the size of generated code: while these values should be large enough
+   * to trigger the Janino bug, these values should not also too big; otherwise one of below
+   * exceptions might be thrown:
+   * - "expand_doConsume would be beyond 64KB"
+   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
+   */
+  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
+    withSQLConf(
+      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
+      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
+      (SQLConf.CODEGEN_FALLBACK.key, "false"),
+      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
+    ) {
+      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")
+
+      // The value is tested under commit "e807118eef9e0214170ff62c828524d237bd58e3":
+      // the query fails with switch statement, whereas it passes with if-else statement.
+      // Note that the value depends on the Spark logic as well - different Spark versions may
+      // require different value to ensure the test failing with switch statement.
+      val numNewFields = 100
+
+      df = df.withColumns(
+        (1 to numNewFields).map { idx => s"a$idx" },
+        (1 to numNewFields).map { idx =>
+          when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
+        }
+      )
+
+      val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
 
 Review comment:
   I was using `steps` parameter and eventually removed it. Given we seem to be between neutral to negative on adopting this patch, I'll defer addressing the nit for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597607332
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119656/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597989450
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#discussion_r391402526
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ##########
 @@ -688,11 +692,56 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
     child.executeColumnar()
   }
 
-  override def doExecute(): RDD[InternalRow] = {
+  private type CompileResult = (CodegenContext, CodeAndComment, GeneratedClass, ByteCodeStats)
+
+  /**
+   * NOTE: This method handles the known Janino bug:
+   * - https://github.com/janino-compiler/janino/issues/113
+   *
+   * It tries to generate code and compile in normal path. If the compilation fails and the reason
+   * is due to the known bug, it generates workaround code via touching flag in CodegenContext and
+   * compile again.
+   */
+  private def doGenCodeAndCompile(): CompileResult = {
 
 Review comment:
   I personally think we'd better fix the generated code of `ExpandExec` if its worth fixing this issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#discussion_r391402526
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ##########
 @@ -688,11 +692,56 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
     child.executeColumnar()
   }
 
-  override def doExecute(): RDD[InternalRow] = {
+  private type CompileResult = (CodegenContext, CodeAndComment, GeneratedClass, ByteCodeStats)
+
+  /**
+   * NOTE: This method handles the known Janino bug:
+   * - https://github.com/janino-compiler/janino/issues/113
+   *
+   * It tries to generate code and compile in normal path. If the compilation fails and the reason
+   * is due to the known bug, it generates workaround code via touching flag in CodegenContext and
+   * compile again.
+   */
+  private def doGenCodeAndCompile(): CompileResult = {
 
 Review comment:
   If so, I personally think we'd better fix the generated code of `ExpandExec` (I'm not sure now that its worth fixing this issue).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597946957
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119681/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597936729
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597973381
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119686/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597947057
 
 
   **[Test build #119686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119686/testReport)** for PR 27872 at commit [`c4375de`](https://github.com/apache/spark/commit/c4375deeca993675ece0367647fb8dd4f0a37410).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597947461
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24415/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-601091320
 
 
   Hmm, that's a hard choice. Janino's bultin tests are far from having good coverage, having hacked my own fork of Janino once and know the code fairly well, I feel somewhat uneasy about the huge refactorings...
   
   Sticking to the "deprecated" Janino 3.0.x branch for Spark 3.0.x doesn't sound like that bad of a situation to me. Spark 3.0 took a long time, but Spark 3.1 shouldn't be that far away.
   
   That said, I wouldn't be too sad if we end up taking Janino 3.1.2 into Spark 3.0.0.
   We'll just release Spark patch version with a new Janino release if we hit a Janino bug...right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597694943
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597989123
 
 
   **[Test build #119693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119693/testReport)** for PR 27872 at commit [`4bd12d8`](https://github.com/apache/spark/commit/4bd12d87c0789ec8a3932078e1241bfe49e599a9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597947452
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-600975879
 
 
   I've finally updated my two WIP PRs to official releases, which invalidates the purpose of this PR. 
   
   Let's move on, but decide which version would be better before moving on, 3.1.2 vs 3.0.16. (For sure, looks like Janino maintainer wants to see us adopt 3.1.x, but the decision is up to us.)
   
   #27860 for Janino 3.1.2
   #27932 for Janino 3.0.16
   
   cc. to everyone commented here to see preference for Janino version.
   
   @squito @dongjoon-hyun @maropu @kiszk @rednaxelafx @srowen @viirya

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597936736
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119682/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597973381
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119686/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599924340
 
 
   Thanks for the detailed explanation, @rednaxelafx !
   
   Btw, as I shared earlier, I have been investigating with Janino itself - I fixed all of bugs on 3.1.x which made Spark UT failing, and hopefully Janino maintainer made a branch for 3.0.x as well. Maybe we can get both of 3.0.16 & 3.1.2 versions in this week (I've asked it from https://github.com/janino-compiler/janino/issues/115#issuecomment-599768827), and then we can simply upgrade Janino to abandon this PR.
   
   I'm running UT in #27860 for Janino 3.1.2 (custom jar via Jitpack) as well as #27932 for Janino 3.0.16 (custom jar via Jitpack). Once both builds run fine, I'll forward the result to Janino maintainer.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597489363
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24387/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598492806
 
 
   Thanks for spending your time to analyze the patch and provide and detailed voice, @rednaxelafx .
   
   The rationalization of this patch is to address known issue "before" we reach the real solution, as the real solution is out of our control; no one can answer when Janino 3.0.16 will be released, and even whether it will be released or not. Janino 3.1 refactors everything by oneself and makes things be unstable - recent version of Janino makes lots of Spark UTs fail which makes me less confident we can migrate to Janino 3.1.x.
   
   I totally agree the patch is a band-aid fix and not well designed to extend (I didn't intend it though), and there're better options available (more to less preferable personally)
   
   1. Participate actively to Janino - fix all the issues in 3.1.x which breaks Spark, and adopt Janino 3.1.x. It should be ideal if we can provide set of regression tests so that Janino community can keep the stability. For sure, I'm not an expert of JVM so I can't lead the effort. A couple of JVM experts should jump in and lead the effort.
   
   2. Fork Janino - create a main dev. branch from v3.0.15, apply the patch for janino-compiler/janino#114 (and port back some bugfixes after 3.0.15), release v3.0.16 by our side. License seems to be 3-clause BSD license, which doesn't seem to restrict redistribution. The thing of this option would be who/which group is willing to maintain the fork and release under their name.
   
   3. This patch. (try to compile and fail-back)
   
   4. Modify ExpandExec to check the number of operations in for statement, and use `if ~ else if` when the number of operations exceed the threshold. This should be ideally checking the length of offset but it would be weird if Spark does it, so count the lines blindly. Performance regression may happen in some cases where it can run with `switch` but due to blind count it runs with `if ~ else if`, but the case wouldn't be common. 
   
   5. Give up addressing issue - either let end users to tune their query or guide that end users should enable failback option and run with interactive mode.
   
   WDYT? Would it be better to raise this to discussion in dev@ list?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599026718
 
 
   Honestly I'm not sure I got the proposal from @rednaxelafx - what I can imagine is only touching ExpandExec to get around is option 4 from my proposal, and if he meant other trick/technique than I don't get it. It would be nice if someone could help me understand via elaborating a bit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599044117
 
 
   I think the option 4 looks fine to me. btw, splitting large code into pieces in `switch` is a solution for this issue? Additionally, we need to replace `switch` with `if`?
   
   > Modify ExpandExec to check the number of operations in for statement, and use if ~ else if when the number of operations exceed the threshold. This should be ideally checking the length of offset but it would be weird if Spark does it, so count the lines blindly. Performance regression may happen in some cases where it can run with switch but due to blind count it runs with if ~ else if, but the case wouldn't be common.
   
   I just want to know the actual performance numbers of this approach. I think splitting large code into small parts might improve performance.
   
   > I have one, but I cannot share since the query is from actual customer. If you're OK with just generated code, I've attached the file in Janino issue janino-compiler/janino#113.
   
   To reproduce the issue, could you build the simple query that you can show us based on your private customer's query? I think the query can make us understood more for the issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597606564
 
 
   **[Test build #119656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119656/testReport)** for PR 27872 at commit [`90ef125`](https://github.com/apache/spark/commit/90ef1253bc92eaea35ed831355878a0f690781c1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597989452
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24422/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597946950
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] squito commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
squito commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#discussion_r391133288
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ##########
 @@ -454,7 +454,9 @@ case class InSet(child: Expression, hset: Set[Any]) extends UnaryExpression with
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-    if (canBeComputedUsingSwitch && hset.size <= SQLConf.get.optimizerInSetSwitchThreshold) {
+    val sqlConf = SQLConf.get
+    if (canBeComputedUsingSwitch && hset.size <= sqlConf.optimizerInSetSwitchThreshold &&
+      sqlConf.codegenUseSwitchStatement) {
 
 Review comment:
   instead of an added configuration, could this be a `try / catch`?  first try with switch, if it fails, then log an error and use if-else?  avoids the users having to set a config, and then unset it again when we have the right fix from janino

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597946957
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119681/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597904355
 
 
   **[Test build #119682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119682/testReport)** for PR 27872 at commit [`c4375de`](https://github.com/apache/spark/commit/c4375deeca993675ece0367647fb8dd4f0a37410).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597911868
 
 
   It would be appreciated if we can elaborate the possible performance regression (even it's like a "thinking out loud"); "may" and "some cases" are too ambiguous.
   
   If the query doesn't fall into the case via hitting Janino issue, it shouldn't bring perf. regression, as it tries to generate code with `switch` and compiles first. If there's perf. regression here, it says I have some unexpected mistake in fix which should be fixed.
   
   If the query fall into the case via hitting Janino issue, either the query has been failing or has been failing back to non-codegen. Taking alternative is definitely better than failing the query; the remaining one is whether using if ~ else if chain is worse than non-codegen.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598037864
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#discussion_r391276928
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala
 ##########
 @@ -178,27 +179,48 @@ case class ExpandExec(
                |${ev.code}
                |${outputColumns(col).isNull} = ${ev.isNull};
                |${outputColumns(col).value} = ${ev.value};
-            """.stripMargin
+             """.stripMargin
 
 Review comment:
   This looks to be a right indentation so fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598492806
 
 
   Thanks for spending your time to analyze the patch and provide and detailed voice, @rednaxelafx .
   
   The rationalization of this patch is to address known issue "before" we reach the real solution, as the real solution is out of our control; no one can answer when Janino 3.0.16 will be released, and even whether it will be released or not. Janino 3.1 refactors everything by oneself and makes things be unstable - recent version of Janino makes lots of Spark UTs fail which makes me less confident we can migrate to Janino 3.1.x.
   
   I totally agree the patch is a band-aid fix and not well designed to extend (I didn't intend it be extendable though), and there're better options available (more to less preferable personally)
   
   1. Participate actively to Janino - fix all the issues in 3.1.x which breaks Spark, and adopt Janino 3.1.x. It should be ideal if we can provide set of regression tests so that Janino community can keep the stability. For sure, I'm not an expert of JVM so I can't lead the effort. A couple of JVM experts should jump in and lead the effort.
   
   2. Fork Janino - create a main dev. branch from v3.0.15, apply the patch for janino-compiler/janino#114 (and port back some bugfixes after 3.0.15), release v3.0.16 by our side. License seems to be 3-clause BSD license, which doesn't seem to restrict redistribution. The thing of this option would be who/which group is willing to maintain the fork and release under their name.
   
   3. This patch. (try to compile and fail-back)
   
   4. Modify ExpandExec to check the number of operations in for statement, and use `if ~ else if` when the number of operations exceed the threshold. This should be ideally checking the length of offset but it would be weird if Spark does it, so count the lines blindly. Performance regression may happen in some cases where it can run with `switch` but due to blind count it runs with `if ~ else if`, but the case wouldn't be common. 
   
   5. Give up addressing issue - either let end users to tune their query or guide that end users should enable failback option and run with interactive mode.
   
   WDYT? Would it be better to raise this to discussion in dev@ list?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597607332
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119656/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597545345
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597947057
 
 
   **[Test build #119686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119686/testReport)** for PR 27872 at commit [`c4375de`](https://github.com/apache/spark/commit/c4375deeca993675ece0367647fb8dd4f0a37410).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597936729
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598492806
 
 
   Thanks for spending your time to analyze the patch and provide and detailed voice, @rednaxelafx .
   
   The rationalization of this patch is to address known issue "before" we reach the real solution, as the real solution is out of our control; no one can answer when Janino 3.0.16 will be released, and even whether it will be released or not. Janino maintainer refactors everything by oneself in 3.1.x and makes things be unstable - recent version of Janino makes lots of Spark UTs fail which makes me less confident we can migrate to Janino 3.1.x.
   
   I totally agree the patch is a band-aid fix and not well designed to extend (I didn't intend it be extendable though), and there're better options available (more to less preferable personally)
   
   1. Participate actively to Janino - fix all the issues in 3.1.x which breaks Spark, and adopt Janino 3.1.x. It should be ideal if we can provide set of regression tests so that Janino community can keep the stability. For sure, I'm not an expert of JVM so I can't lead the effort. A couple of JVM experts should jump in and lead the effort.
   
   2. Fork Janino - create a main dev. branch from v3.0.15, apply the patch for janino-compiler/janino#114 (and port back some bugfixes after 3.0.15), release v3.0.16 by our side. License seems to be 3-clause BSD license, which doesn't seem to restrict redistribution. The thing of this option would be who/which group is willing to maintain the fork and release under their name.
   (I might not want to maintain the fork, but may want to contribute the new fork, as I'm the author of the patch for janino-compiler/janino#114 and interested to fix the stack overflow issue in SPARK-25987 which is also an issue of Janino 3.0.x.)
   
   3. This patch. (try to compile and fail-back)
   
   4. Modify ExpandExec to check the number of operations in for statement, and use `if ~ else if` when the number of operations exceed the threshold. This should be ideally checking the length of offset but it would be weird if Spark does it, so count the lines blindly. Performance regression may happen in some cases where it can run with `switch` but due to blind count it runs with `if ~ else if`, but the case wouldn't be common. 
   
   5. Give up addressing issue - either let end users to tune their query or guide that end users should enable failback option and run with interactive mode.
   
   WDYT? Would it be better to raise this to discussion in dev@ list?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597935352
 
 
   Hi, @HeartSaVioR , thanks for the work! btw, do you know what's an user query to reproduce this issue? If this issue occurs frequently on the user side, it might be worth adding the workaround. its a corner case, I personally think its ok just to fall back into the interpreter mode for avoiding the maintenance overhead of the workaround.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597560850
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24399/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597936736
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119682/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597936647
 
 
   **[Test build #119682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119682/testReport)** for PR 27872 at commit [`c4375de`](https://github.com/apache/spark/commit/c4375deeca993675ece0367647fb8dd4f0a37410).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-601064756
 
 
   Looks like Janino maintainer explicitly mentioned 3.0.x line is deprecated in Janino changelog.
   
   > Version 3.0.16, 2020-03-18 (the 3.0.x line is deprecated)
   
   Upgrading Janino to 3.0.16 in Spark 2.4.x wouldn't be any issue as I expect only few of new Spark 2.4.x releases, but for Spark 3.0.0 we expect many further bugfix releases which looks to me that @maropu voice is valid.
   
   I'm not sure there's some policy on upgrading dependency based on semver - like upgrading minor version of dependency in bugfix version, but if that's considered as a bad practice in community, we may be better to avoid deprecated version line. Let's collect more voices here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597973309
 
 
   **[Test build #119686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119686/testReport)** for PR 27872 at commit [`c4375de`](https://github.com/apache/spark/commit/c4375deeca993675ece0367647fb8dd4f0a37410).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597946741
 
 
   **[Test build #119681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119681/testReport)** for PR 27872 at commit [`5aa7ece`](https://github.com/apache/spark/commit/5aa7ece30c5f44aacc9d994c0fc4a05d214b046c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class CodegenContext(val disallowSwitchStatement: Boolean = false) extends Logging `

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-597904953
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24411/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597886463
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597497960
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24393/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

Posted by GitBox <gi...@apache.org>.
srowen commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-600978381
 
 
   I don't have a strong preference, but if 3.1.2 works, seems like the time to jump to it is in Spark 3.0, all else equal?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#discussion_r391277080
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ##########
 @@ -647,7 +651,7 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int)
       }
 
       ${ctx.registerComment(
-        s"""Codegend pipeline for stage (id=$codegenStageId)
+        s"""Codegen pipeline for stage (id=$codegenStageId)
 
 Review comment:
   Fixed a typo as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27872: [SPARK-31115][SQL] Provide config to avoid using switch statement in generated code to avoid Janino bug
URL: https://github.com/apache/spark/pull/27872#issuecomment-597560830
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org