You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/12 08:24:14 UTC

[GitHub] [spark] rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code

rednaxelafx commented on issue #27872: [SPARK-31115][SQL] Detect known Janino bug janino-compiler/janino#113 and apply workaround automatically as a fail-back via avoid using switch statement in generated code
URL: https://github.com/apache/spark/pull/27872#issuecomment-598063427
 
 
   I have mixed feelings about this PR.
   
   The good part:
   
   It does do what it advertised, and should not result in performance regressions since the code that would have passed compilation would stay as-is, and only the code that triggers error would go to the fallback mode which does come with slightly bigger code size, but wouldn't really affect runtime performance *if the generated code is still JIT compiled by an optimizing JIT*.
   
   The less-good part:
   
   I feel like the codegen system may make good use of a good retry framework, so that it _may be able to_ do things that it couldn't do right now.
   e.g. code splitting comes at a cost of extra function invocation overhead. If we generated code without code split and still stayed within either 8000 bytes or 64KB (depending on which threshold you care about more), then it'd be better not to split. Right now Spark SQL's codegen just does that unconditionally using "length of code text" as the trigger; if a retry framework were in place, we could have tried to generate without splitting first, and only go conservative if the thresholds are crossed.
   
   The code presented here is an attempt at a one-off case of retry, and I'm somewhat concerned about future PRs that try to pile on top of it to implement retry for other scenarios, and just let the code "evolve organically" without a good base design first.
   
   The the main focus is to buy some time before we can get Janino to release 3.0.16, my hunch is it's possible to slightly tune the codegen for ExpandExec to make it generate just small enough code to work around the customer issue bugging you right now.
   
   e.g. if a part of the problem was caused by the case body of the `switch` statement being too big, and if your customer was only hitting that in Spark 2.3.x but not in Spark 2.4.x / 3.0.x, then using this style of generating the code for `Literal(null, dataType)` may help, at least with the example code you showed in the Janino bug you reported:
   ```scala
   ExprCode(code = EmptyBlock, isNull = TrueLiteral, JavaCode.defaultLiteral(dataType))
   ```
   (from Spark 2.4 / 3.0 / master: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L69)
   
   WDYT @HeartSaVioR ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org