You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/03 20:37:52 UTC

[GitHub] [spark] bersprockets opened a new pull request, #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

bersprockets opened a new pull request, #36442:
URL: https://github.com/apache/spark/pull/36442

   ### What changes were proposed in this pull request?
   
   In `DivideYMInterval#doGenCode` and `DivideDTInterval#doGenCode`, rely on the operand variable names provided by `nullSafeCodeGen` rather than calling `genCode` on the operands twice.
   
   
   ### Why are the changes needed?
   
   `DivideYMInterval#doGenCode` and `DivideDTInterval#doGenCode` call `genCode` on the operands twice (once directly, and once indirectly via `nullSafeCodeGen`). However, if you call `genCode` on an operand twice, you might not get back the same variable name for both calls (e.g., when the operand is not a `BoundReference` or if whole-stage codegen is turned off). When that happens, `nullSafeCodeGen` generates initialization code for one set of variables, but the divide expression generates usage code for another set of variables, resulting in compilation errors like this:
   ```
   spark-sql> create or replace temp view v1 as
            > select * FROM VALUES
            > (interval '10' months, interval '10' day, 2)
            > as v1(period, duration, num);
   Time taken: 2.81 seconds
   spark-sql> cache table v1;
   Time taken: 2.184 seconds
   spark-sql> select period/(num + 3) from v1;
   22/05/03 08:56:37 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 44: Expression "project_value_2" is not an rvalue
   ...
   22/05/03 08:56:37 WARN UnsafeProjection: Expr codegen error and falling back to interpreter mode
   ...
   0-2
   Time taken: 0.149 seconds, Fetched 1 row(s)
   spark-sql> select duration/(num + 3) from v1;
   22/05/03 08:57:29 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 54: Expression "project_value_2" is not an rvalue
   ...
   22/05/03 08:57:29 WARN UnsafeProjection: Expr codegen error and falling back to interpreter mode
   ...
   2 00:00:00.000000000
   Time taken: 0.089 seconds, Fetched 1 row(s)
   ```
   The error is not fatal (unless you have `spark.sql.codegen.fallback` set to `false`), but it muddies the log and can slow the query (since the expression is interpreted).
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit tests (unit tests run with `spark.sql.codegen.fallback` set to `false`, so the new tests fail without the fix).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a diff in pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on code in PR #36442:
URL: https://github.com/apache/spark/pull/36442#discussion_r864620743


##########
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala:
##########
@@ -2988,4 +2988,16 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession {
     checkAnswer(uncDf.filter($"src".ilike("ѐёђѻώề")), Seq("ЀЁЂѺΏỀ").toDF())
     // scalastyle:on
   }
+
+  test("SPARK-39093: divide period by integral expression") {
+    val df = Seq(((Period.ofDays(10)), 2)).toDF("pd", "num")
+    checkAnswer(df.select($"pd" / ($"num" + 3)),
+      Seq((Period.ofDays(2))).toDF)
+  }
+
+  test("SPARK-39093: divide duration by integral expression") {
+    val df = Seq(((Duration.ofDays(10)), 2)).toDF("dd", "num")
+    checkAnswer(df.select($"dd" / ($"num" + 3)),
+      Seq((Duration.ofDays(2))).toDF)
+  }

Review Comment:
   @singhpk234 There is a method `checkEvaluation` for this.
   Simply have this end-to-end test is also fine to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang closed pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
gengliangwang closed pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral
URL: https://github.com/apache/spark/pull/36442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] singhpk234 commented on a diff in pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #36442:
URL: https://github.com/apache/spark/pull/36442#discussion_r864555565


##########
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala:
##########
@@ -2988,4 +2988,16 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession {
     checkAnswer(uncDf.filter($"src".ilike("ѐёђѻώề")), Seq("ЀЁЂѺΏỀ").toDF())
     // scalastyle:on
   }
+
+  test("SPARK-39093: divide period by integral expression") {
+    val df = Seq(((Period.ofDays(10)), 2)).toDF("pd", "num")
+    checkAnswer(df.select($"pd" / ($"num" + 3)),
+      Seq((Period.ofDays(2))).toDF)
+  }
+
+  test("SPARK-39093: divide duration by integral expression") {
+    val df = Seq(((Duration.ofDays(10)), 2)).toDF("dd", "num")
+    checkAnswer(df.select($"dd" / ($"num" + 3)),
+      Seq((Duration.ofDays(2))).toDF)
+  }

Review Comment:
   [question] should we add this test with these SQL Confs : 
   (i) WHOLESTAGE_CODEGEN_ENABLED (true) , CODEGEN_FALLBACK (false)
   (ii) WHOLESTAGE_CODEGEN_ENABLED (false) 
   
   As even when the codegen fails by default it will be retried in interpreted mode (as you already called out) and hence checkAnswer would not be able to catch this. And similar is the case with WSCG off as it may pass in Codegen but fail in Interpreted mode, hence we might not be able to catch that. Your thoughts ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] singhpk234 commented on a diff in pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #36442:
URL: https://github.com/apache/spark/pull/36442#discussion_r864555565


##########
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala:
##########
@@ -2988,4 +2988,16 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession {
     checkAnswer(uncDf.filter($"src".ilike("ѐёђѻώề")), Seq("ЀЁЂѺΏỀ").toDF())
     // scalastyle:on
   }
+
+  test("SPARK-39093: divide period by integral expression") {
+    val df = Seq(((Period.ofDays(10)), 2)).toDF("pd", "num")
+    checkAnswer(df.select($"pd" / ($"num" + 3)),
+      Seq((Period.ofDays(2))).toDF)
+  }
+
+  test("SPARK-39093: divide duration by integral expression") {
+    val df = Seq(((Duration.ofDays(10)), 2)).toDF("dd", "num")
+    checkAnswer(df.select($"dd" / ($"num" + 3)),
+      Seq((Duration.ofDays(2))).toDF)
+  }

Review Comment:
   [question] should we add this test with these SQL Confs : 
   (i) WHOLESTAGE_CODEGEN_ENABLED (true) , CODEGEN_FALLBACK (false)
   (ii) WHOLESTAGE_CODEGEN_ENABLED (false) 
   
   As even when the codegen fails by default it will be retried in interpreted mode and hence checkAnswer would not be able to catch this. And similar is the case with WSCG off as it may pass in Codegen but fail in Interpreted mode, hence we might not be able to catch that. Your thoughts ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] singhpk234 commented on a diff in pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #36442:
URL: https://github.com/apache/spark/pull/36442#discussion_r864956601


##########
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala:
##########
@@ -2988,4 +2988,16 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession {
     checkAnswer(uncDf.filter($"src".ilike("ѐёђѻώề")), Seq("ЀЁЂѺΏỀ").toDF())
     // scalastyle:on
   }
+
+  test("SPARK-39093: divide period by integral expression") {
+    val df = Seq(((Period.ofDays(10)), 2)).toDF("pd", "num")
+    checkAnswer(df.select($"pd" / ($"num" + 3)),
+      Seq((Period.ofDays(2))).toDF)
+  }
+
+  test("SPARK-39093: divide duration by integral expression") {
+    val df = Seq(((Duration.ofDays(10)), 2)).toDF("dd", "num")
+    checkAnswer(df.select($"dd" / ($"num" + 3)),
+      Seq((Duration.ofDays(2))).toDF)
+  }

Review Comment:
   missed that, Thanks @bersprockets 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on a diff in pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
bersprockets commented on code in PR #36442:
URL: https://github.com/apache/spark/pull/36442#discussion_r864952175


##########
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala:
##########
@@ -2988,4 +2988,16 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession {
     checkAnswer(uncDf.filter($"src".ilike("ѐёђѻώề")), Seq("ЀЁЂѺΏỀ").toDF())
     // scalastyle:on
   }
+
+  test("SPARK-39093: divide period by integral expression") {
+    val df = Seq(((Period.ofDays(10)), 2)).toDF("pd", "num")
+    checkAnswer(df.select($"pd" / ($"num" + 3)),
+      Seq((Period.ofDays(2))).toDF)
+  }
+
+  test("SPARK-39093: divide duration by integral expression") {
+    val df = Seq(((Duration.ofDays(10)), 2)).toDF("dd", "num")
+    checkAnswer(df.select($"dd" / ($"num" + 3)),
+      Seq((Duration.ofDays(2))).toDF)
+  }

Review Comment:
   >will be retried in interpreted mode and hence checkAnswer would not be able to catch this
   
   I just wanted to mention that `CODEGEN_FALLBACK` is set to `false` for `checkAnswer`, so these two tests do fail without the fix.
   
   https://github.com/apache/spark/blob/834841ef5dab150f249d4171fddb474251beecac/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala#L70



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #36442:
URL: https://github.com/apache/spark/pull/36442#issuecomment-1117097917

   Merging to master/3.3 to unblock Spark 3.3 RC1
   @MaxGekk could you include this one in the RC1?
   @bersprockets @singhpk234 feel free to create a follow-up for enhancing the tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36442: [SPARK-39093][SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36442:
URL: https://github.com/apache/spark/pull/36442#issuecomment-1116788539

   cc @gengliangwang FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org