You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/13 01:18:18 UTC
[GitHub] [spark] rednaxelafx edited a comment on issue #27544: [SPARK-30795][SQL] Spark SQL codegen's code() interpolator should treat escapes like Scala's StringContext.s()

rednaxelafx edited a comment on issue #27544: [SPARK-30795][SQL] Spark SQL codegen's code() interpolator should treat escapes like Scala's StringContext.s()
URL: https://github.com/apache/spark/pull/27544#issuecomment-585498805
 
 
   BTW, just want to mention that, before this PR, on Apache Spark master / branch-3.0, the following query would fail to compile the generated code, which is a regression.
   
   ```scala
   Seq(("abc", "%bc")).toDF("s", "p").repartition(1).selectExpr("s like p").queryExecution.debug.codegen
   ```
   produces:
   ```java
   /* 041 */           java.util.regex.Pattern project_pattern_0 = java.util.regex.Pattern.compile(
   /* 042 */             org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(project_rightStr_0, '\'));
   ```
   So if anybody has ever tried the non-constant pattern code path on master recently, it should have failed:
   ```
   scala> Seq(("abc", "%bc")).toDF("s", "p").repartition(1).selectExpr("s like p").collect
   20/02/13 01:05:30 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 42, Column 84: Closing single quote missing
   org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 42, Column 84: Closing single quote missing
   ```
   
   That probably means we're still lacking test coverage for the regex expressions in Spark SQL. May be worth adding some tests before 3.0 goes out the door, e.g. DataFrame tests and/or SQL tests.
   
   (Don't know who to cc ... probably gonna be a long list? I'll cc the list of reviewers who already reviewed this PR:
   @viirya @maropu @kiszk @HyukjinKwon @cloud-fan and also @gatorsmile 
   )
   
   The reason why the RegexExpressionSuite unit test didn't catch the bug is because the codegen test framework isn't doing the right thing.
   https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L280-L294
   `ExpressionEvalHelper.evaluateWithUnsafeProjection` was explicitly triggering codegen twice, with the intention that common-subexpression elimination (CSE) should NOT be active, and the expression should actually codegen twice to expose bugs.
   
   Yet what's actually happening is that CSE was triggered for these tests, which just so happens to allow the bug that this PR fixes dodge the bullet and slip through the tests:
   https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L308-L315
   In `GenerateUnsafeProjection`, there's a piece of logic that generates code from expressions, and then embeds the generated code into a `code"..."` block again, which will trigger this bug.
   But because CSE was active, the problematic code did not go through this code path, but instead got generated into a separate function which did not get wrapped into a `code"..."` again. That's how it (unfortunately) dodged the bullet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org