You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/16 23:54:57 UTC

[GitHub] [spark] rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

rednaxelafx edited a comment on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-599809129
 
 
   My suggestion for tuning codegen may have been too cryptic, sorry about not providing more background explanation.
   
   Let me explain why I arrived at the `Literal` codegen improvement suggestion (targeted for Spark 2.3.x branch -- anything starting from Spark 2.4.0 has it already). @viirya probably saw through this already ;-)
   
   My assumptions:
   1. The problematic code from actual customer was generated from Spark 2.3.x that @HeartSaVioR is trying to help fix. So to unblock this specific customer, slightly improving Spark 2.3.x branch may help.
   2. This PR targets master because the same Janino issue does still show up in master as well.
   3. But the specific customer isn't using Spark 2.4 yet (nor Spark 3.0 since it isn't even released yet).
   
   From what @HeartSaVioR was able to share in https://github.com/janino-compiler/janino/issues/113, specifically in the attachment https://github.com/janino-compiler/janino/files/4295191/error-codegen.log, we can see that a large portion of the switch...case body is filled with code like the following:
   ```java
   /* 4677 */       case 0:
   /* 4678 */         final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = expand_value129;
   /* 4681 */
   /* 4682 */         final long expand_value130 = -1L;
   /* 4683 */         expand_isNull67 = true;
   /* 4684 */         expand_value67 = expand_value130;
   /* 4685 */
   /* 4686 */         final long expand_value131 = -1L;
   /* 4687 */         expand_isNull68 = true;
   /* 4688 */         expand_value68 = expand_value131;
   /* 4689 */
   /* 4690 */         final long expand_value132 = -1L;
   /* 4691 */         expand_isNull69 = true;
   /* 4692 */         expand_value69 = expand_value132;
   ```
   
   If we zoom in on the 3 statements here:
   ```java
   /* 4678 */         final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = expand_value129;
   ```
   `javac` would have followed JLS and compiled it into two assignments (L4679 and L4680 with constant propagation), and skipped L4678 because it's a Java language level constant expression initialization.
   In pseudo bytecode, that's:
   ```
   load constant long 1  # constant true
   store to local expand_isNull66
   load constant int -1L
   store to local expand_value66
   ```
   
   Janino, on the other hand, doesn't handle all of Java language level constant expression constructs. Specifically, it does not support local constant declarations. So it compiles the code as 3 assignments:
   ```
   load constant long -1L
   store to local expand_value129
   load constant int 1  # constant true
   store to local expand_isNull66
   load from local expand_value129
   store to local expand_value66
   ```
   
   So every "chunk" of code above would be a few more bytes of bytecode. It may be small for just one chunk, but the codegen for `ExpandExec` makes that problem really bad by generating a lot of them.
   
   So where does this "value = -1L; isNull = true" thingy come from? We can infer that it's generated from `Literal(null, LongType)`. That is, https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L282-L285
   
   We can easily make Spark 2.3.x generate slightly better code, by improving `Literal`'s codegen, replacing
   ```scala
       if (value == null) {
         ev.isNull = "true"
         ev.copy(s"final $javaType ${ev.value} = ${ctx.defaultValue(dataType)};")
       }
   ```
   
   with something like:
   ```scala
       if (value == null) {
         ev.copy(
           isNull = "true",
           value = ctx.defaultValue(dataType)
         )
       }
   ```
   (the `value` part may or may not have to be further adjusted. Perhaps like `value = s"(($javaType)${ctx.defaultValue(dataType)})"`) for maximal safety.
   
   which would improve the example above into:
   ```java
   /* 4678 */         // THIS IS GONE: final long expand_value129 = -1L;
   /* 4679 */         expand_isNull66 = true;
   /* 4680 */         expand_value66 = -1L;
   ```
   so that Janino can generate less bytecode for this particular query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org