You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/17 18:34:38 UTC

[GitHub] [spark] sunchao opened a new pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

sunchao opened a new pull request #29792:
URL: https://github.com/apache/spark/pull/29792


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   In SPARK-24994 we implemented unwrapping cast for integral types. This extends it to support numeric types such as float/double/decimal, so that filters involving casts can be better pushed down to data sources.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   For queries such as:
   ```sql
   SELECT * FROM tbl WHERE short_col < 100.5
   ```
   The predicate `short_col < 100.5` can't be pushed down to data sources because it involves casts. This eliminates the cast so these queries can run more efficiently.
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Unit tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r493753795



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -79,13 +106,65 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) < v, falseIfNotNull(f))
   }
 
-  test("unwrap casts when literal is within range (min, max)") {
-    assertEquivalent(castInt(f) > 300, f > 300.toShort)
-    assertEquivalent(castInt(f) >= 500, f >= 500.toShort)
-    assertEquivalent(castInt(f) === 32766, f === 32766.toShort)
-    assertEquivalent(castInt(f) <=> 32766, f <=> 32766.toShort)
-    assertEquivalent(castInt(f) <= -6000, f <= -6000.toShort)
-    assertEquivalent(castInt(f) < -32767, f < -32767.toShort)
+  test("unwrap casts when literal is within range (min, max) or fromType has no range") {
+    Seq(300, 500, 32766, -6000, -32767).foreach(v => {
+      assertEquivalent(castInt(f) > v, f > v.toShort)
+      assertEquivalent(castInt(f) >= v, f >= v.toShort)
+      assertEquivalent(castInt(f) === v, f === v.toShort)
+      assertEquivalent(castInt(f) <=> v, f <=> v.toShort)
+      assertEquivalent(castInt(f) <= v, f <= v.toShort)
+      assertEquivalent(castInt(f) < v, f < v.toShort)
+    })
+
+    Seq(3.14.toFloat.toDouble, -1000.0.toFloat.toDouble,
+      20.0.toFloat.toDouble, -2.414.toFloat.toDouble,
+      Float.MinValue.toDouble, Float.MaxValue.toDouble, Float.PositiveInfinity.toDouble
+    ).foreach(v => {
+      assertEquivalent(castDouble(f2) > v, f2 > v.toFloat)
+      assertEquivalent(castDouble(f2) >= v, f2 >= v.toFloat)
+      assertEquivalent(castDouble(f2) === v, f2 === v.toFloat)
+      assertEquivalent(castDouble(f2) <=> v, f2 <=> v.toFloat)
+      assertEquivalent(castDouble(f2) <= v, f2 <= v.toFloat)
+      assertEquivalent(castDouble(f2) < v, f2 < v.toFloat)
+    })
+
+    Seq(decimal2(100.20), decimal2(-200.50)).foreach(v => {
+      assertEquivalent(castDecimal2(f3) > v, f3 > decimal(v))
+      assertEquivalent(castDecimal2(f3) >= v, f3 >= decimal(v))
+      assertEquivalent(castDecimal2(f3) === v, f3 === decimal(v))
+      assertEquivalent(castDecimal2(f3) <=> v, f3 <=> decimal(v))
+      assertEquivalent(castDecimal2(f3) <= v, f3 <= decimal(v))
+      assertEquivalent(castDecimal2(f3) < v, f3 < decimal(v))
+    })
+  }
+
+  test("unwrap cast when literal is within range (min, max) AND has round up or down") {
+    // Cases for rounding down
+    var doubleValue = 100.6
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) === doubleValue, falseIfNotNull(f))
+    assertEquivalent(castDouble(f) <=> doubleValue, false)
+    assertEquivalent(castDouble(f) <= doubleValue, f <= doubleValue.toShort)
+    assertEquivalent(castDouble(f) < doubleValue, f <= doubleValue.toShort)
+
+    // Cases for rounding up: 3.14 will be rounded to 3.14000010... after casting to float

Review comment:
       @cloud-fan Sorry i was wrong in the above comment (somehow I was thinking casting from double to short there). 
   
   Yes, it appears that casting from double to float can be either rounding up or down, depending on value:
   
   ```scala
   scala> val x = 0.39999999
   x: Double = 0.39999999
   
   scala> val y = x.toFloat
   y: Float = 0.39999998
   
   scala > val x = 0.49999999
   y: Double = 0.49999999
   
   scala> val y = x.toFloat
   y: Float = 0.5
   ```
   Also the test here is incorrect. casting 3.14 of double to float doesn't change anything. I'll update it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #29792:
URL: https://github.com/apache/spark/pull/29792


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707351927






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702263668


   **[Test build #129311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129311/testReport)** for PR 29792 at commit [`3340de4`](https://github.com/apache/spark/commit/3340de447c551ca2c7d0e9a470b99a02a87913b1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r493216123



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -67,6 +77,23 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) <=> v.toInt, f <=> v)
     assertEquivalent(castInt(f) <= v.toInt, f === v)
     assertEquivalent(castInt(f) < v.toInt, falseIfNotNull(f))
+
+    val d = Float.NegativeInfinity
+    assertEquivalent(castDouble(f2) > d.toDouble, f2 =!= d)

Review comment:
       is casting double to float rounding up or rounding down?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
dbtsai commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496884740



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we
+    // optimize by moving the cast to the literal side.
+
+    val newValue = Cast(Literal(value), fromType).eval()
+    if (newValue == null) {
+      // This means the cast failed, for instance, due to the value is not representable in the
+      // narrower type. In this case we simply return the original expression.
+      return exp
+    }
+    val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()

Review comment:
       Is it defined as part of IEEE Standard for Floating-Point Arithmetic (IEEE 754)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r502238470



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {

Review comment:
       Could you add these end-2-end tests in `SQLQueryTestSuite` instead of making a new suite?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705960760


   @cloud-fan added an e2e test suite. Please take another look, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r503130781



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {

Review comment:
       I think it's fine to follow ReplaceNullWithFalseInPredicateEndToEndSuite here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702431932






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496913161



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we
+    // optimize by moving the cast to the literal side.
+
+    val newValue = Cast(Literal(value), fromType).eval()
+    if (newValue == null) {
+      // This means the cast failed, for instance, due to the value is not representable in the
+      // narrower type. In this case we simply return the original expression.
+      return exp
+    }
+    val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()

Review comment:
       Yes I think both the binary format as well as rounding rules are specified in IEEE 754. There are a few rounding rules and I think the default one is "rounding to half even".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r502238470



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {

Review comment:
       Could you add these end-2-end tests in `SQLQueryTestSuite` instead of making a new suite?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -35,18 +35,32 @@ import org.apache.spark.sql.types._
  * to be optimized away later and pushed down to data sources.
  *
  * Currently this only handles cases where:
- *   1). `fromType` (of `fromExp`) and `toType` are of integral types (i.e., byte, short, int and
- *     long)
+ *   1). `fromType` (of `fromExp`) and `toType` are of numeric types (i.e., short, int, float,
+ *     decimal, etc)
  *   2). `fromType` can be safely coerced to `toType` without precision loss (e.g., short to int,
  *     int to long, but not long to int)
  *
  * If the above conditions are satisfied, the rule checks to see if the literal `value` is within
  * range `(min, max)`, where `min` and `max` are the minimum and maximum value of `fromType`,
- * respectively. If this is true then it means we can safely cast `value` to `fromType` and thus
+ * respectively. If this is true then it means we may safely cast `value` to `fromType` and thus
  * able to move the cast to the literal side. That is:
  *
  *   `cast(fromExp, toType) op value` ==> `fromExp op cast(value, fromType)`
  *
+ * Note there are some exceptions to the above: if casting from `value` to `fromType` causes
+ * rounding up or down, the above conversion will no longer be valid. Instead, the rule does the
+ * following:
+ *
+ * if casting `value` to `fromType` causes rounding up:
+ *  - `cast(fromExp, toType) > value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) >= value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) === value` ==> if(isnull(fromExp), null, false)
+ *  - `cast(fromExp, toType) <=> value` ==> false (if `fromExp` is deterministic)
+ *  - `cast(fromExp, toType) <= value` ==> `fromExp < cast(value, fromType)`
+ *  - `cast(fromExp, toType) < value` ==> `fromExp < cast(value, fromType)`
+ *
+ *  Similarly for the case when casting `value` to `fromType` causes rounding down.

Review comment:
       nit: wrong indent.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -200,25 +248,27 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
   /**
    * Check if the input `fromExp` can be safely cast to `toType` without any loss of precision,
    * i.e., the conversion is injective. Note this only handles the case when both sides are of
-   * integral type.
+   * numeric type.
    */
   private def canImplicitlyCast(
       fromExp: Expression,
       toType: DataType,
       literalType: DataType): Boolean = {
     toType.sameType(literalType) &&
       !fromExp.foldable &&
-      fromExp.dataType.isInstanceOf[IntegralType] &&
-      toType.isInstanceOf[IntegralType] &&
+      fromExp.dataType.isInstanceOf[NumericType] &&
+      toType.isInstanceOf[NumericType] &&
       Cast.canUpCast(fromExp.dataType, toType)
   }
 
-  private def getRange(dt: DataType): (Any, Any) = dt match {
-    case ByteType => (Byte.MinValue, Byte.MaxValue)
-    case ShortType => (Short.MinValue, Short.MaxValue)
-    case IntegerType => (Int.MinValue, Int.MaxValue)
-    case LongType => (Long.MinValue, Long.MaxValue)
-    case other => throw new IllegalArgumentException(s"Unsupported type: ${other.catalogString}")
+  private def getRange(dt: DataType): Option[(Any, Any)] = dt match {
+    case ByteType => Some((Byte.MinValue, Byte.MaxValue))
+    case ShortType => Some((Short.MinValue, Short.MaxValue))
+    case IntegerType => Some((Int.MinValue, Int.MaxValue))
+    case LongType => Some((Long.MinValue, Long.MaxValue))
+    case FloatType => Some((Float.NegativeInfinity, Float.NaN))
+    case DoubleType => Some((Double.NegativeInfinity, Double.NaN))

Review comment:
       Looks it does not have any test for this code path, so could you add some tests for it. (NOTE: I think `byte`, `int`, and `long` are not tested in `UnwrapCastInBinaryComparisonSuite`, too)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-704398959


   > Can we have an end-to-end test suite for it?
   
   Thanks - this is a good suggestion. Will add that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702300557


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33926/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r502400334



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -200,25 +248,27 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
   /**
    * Check if the input `fromExp` can be safely cast to `toType` without any loss of precision,
    * i.e., the conversion is injective. Note this only handles the case when both sides are of
-   * integral type.
+   * numeric type.
    */
   private def canImplicitlyCast(
       fromExp: Expression,
       toType: DataType,
       literalType: DataType): Boolean = {
     toType.sameType(literalType) &&
       !fromExp.foldable &&
-      fromExp.dataType.isInstanceOf[IntegralType] &&
-      toType.isInstanceOf[IntegralType] &&
+      fromExp.dataType.isInstanceOf[NumericType] &&
+      toType.isInstanceOf[NumericType] &&
       Cast.canUpCast(fromExp.dataType, toType)
   }
 
-  private def getRange(dt: DataType): (Any, Any) = dt match {
-    case ByteType => (Byte.MinValue, Byte.MaxValue)
-    case ShortType => (Short.MinValue, Short.MaxValue)
-    case IntegerType => (Int.MinValue, Int.MaxValue)
-    case LongType => (Long.MinValue, Long.MaxValue)
-    case other => throw new IllegalArgumentException(s"Unsupported type: ${other.catalogString}")
+  private def getRange(dt: DataType): Option[(Any, Any)] = dt match {
+    case ByteType => Some((Byte.MinValue, Byte.MaxValue))
+    case ShortType => Some((Short.MinValue, Short.MaxValue))
+    case IntegerType => Some((Int.MinValue, Int.MaxValue))
+    case LongType => Some((Long.MinValue, Long.MaxValue))
+    case FloatType => Some((Float.NegativeInfinity, Float.NaN))
+    case DoubleType => Some((Double.NegativeInfinity, Double.NaN))

Review comment:
       Looks it does not have any test for this code path, so could you add some tests for it. (NOTE: I think `byte`, `int`, and `long` are not tested in `UnwrapCastInBinaryComparisonSuite`, too)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702287744


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33926/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-694569792






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r502557093



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {

Review comment:
       Yea I can add the test there instead - I was just following the existing [ReplaceNullWithFalseInPredicateEndToEndSuite](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/ReplaceNullWithFalseInPredicateEndToEndSuite.scala) though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r495008728



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -79,13 +106,65 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) < v, falseIfNotNull(f))
   }
 
-  test("unwrap casts when literal is within range (min, max)") {
-    assertEquivalent(castInt(f) > 300, f > 300.toShort)
-    assertEquivalent(castInt(f) >= 500, f >= 500.toShort)
-    assertEquivalent(castInt(f) === 32766, f === 32766.toShort)
-    assertEquivalent(castInt(f) <=> 32766, f <=> 32766.toShort)
-    assertEquivalent(castInt(f) <= -6000, f <= -6000.toShort)
-    assertEquivalent(castInt(f) < -32767, f < -32767.toShort)
+  test("unwrap casts when literal is within range (min, max) or fromType has no range") {
+    Seq(300, 500, 32766, -6000, -32767).foreach(v => {
+      assertEquivalent(castInt(f) > v, f > v.toShort)
+      assertEquivalent(castInt(f) >= v, f >= v.toShort)
+      assertEquivalent(castInt(f) === v, f === v.toShort)
+      assertEquivalent(castInt(f) <=> v, f <=> v.toShort)
+      assertEquivalent(castInt(f) <= v, f <= v.toShort)
+      assertEquivalent(castInt(f) < v, f < v.toShort)
+    })
+
+    Seq(3.14.toFloat.toDouble, -1000.0.toFloat.toDouble,
+      20.0.toFloat.toDouble, -2.414.toFloat.toDouble,
+      Float.MinValue.toDouble, Float.MaxValue.toDouble, Float.PositiveInfinity.toDouble
+    ).foreach(v => {
+      assertEquivalent(castDouble(f2) > v, f2 > v.toFloat)
+      assertEquivalent(castDouble(f2) >= v, f2 >= v.toFloat)
+      assertEquivalent(castDouble(f2) === v, f2 === v.toFloat)
+      assertEquivalent(castDouble(f2) <=> v, f2 <=> v.toFloat)
+      assertEquivalent(castDouble(f2) <= v, f2 <= v.toFloat)
+      assertEquivalent(castDouble(f2) < v, f2 < v.toFloat)
+    })
+
+    Seq(decimal2(100.20), decimal2(-200.50)).foreach(v => {
+      assertEquivalent(castDecimal2(f3) > v, f3 > decimal(v))
+      assertEquivalent(castDecimal2(f3) >= v, f3 >= decimal(v))
+      assertEquivalent(castDecimal2(f3) === v, f3 === decimal(v))
+      assertEquivalent(castDecimal2(f3) <=> v, f3 <=> decimal(v))
+      assertEquivalent(castDecimal2(f3) <= v, f3 <= decimal(v))
+      assertEquivalent(castDecimal2(f3) < v, f3 < decimal(v))
+    })
+  }
+
+  test("unwrap cast when literal is within range (min, max) AND has round up or down") {
+    // Cases for rounding down
+    var doubleValue = 100.6
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) === doubleValue, falseIfNotNull(f))
+    assertEquivalent(castDouble(f) <=> doubleValue, false)
+    assertEquivalent(castDouble(f) <= doubleValue, f <= doubleValue.toShort)
+    assertEquivalent(castDouble(f) < doubleValue, f <= doubleValue.toShort)
+
+    // Cases for rounding up: 3.14 will be rounded to 3.14000010... after casting to float

Review comment:
       This is an important point. Can we explain how to know it's rounding up or down in the PR description?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r503133967



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  test("cases when literal is max") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MaxValue, Float.NaN), (3, null, null))
+        .toDF("c1", "c2", "c3").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      var lit = Short.MaxValue.toInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Row(1))
+
+      checkAnswer(df.select("c1").where(s"c3 > double('nan')"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c3 >= double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 == double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 <=> double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 != double('nan')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 <= double('nan')"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c3 < double('nan')"), Row(1))
+
+      lit = positiveInt

Review comment:
       this doesn't match the test case name `cases when literal is max`
   
   We can put it in a new test `cases when literal exceeds max`

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  test("cases when literal is max") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MaxValue, Float.NaN), (3, null, null))
+        .toDF("c1", "c2", "c3").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      var lit = Short.MaxValue.toInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Row(1))
+
+      checkAnswer(df.select("c1").where(s"c3 > double('nan')"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c3 >= double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 == double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 <=> double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 != double('nan')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 <= double('nan')"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c3 < double('nan')"), Row(1))
+
+      lit = positiveInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Row(1) :: Row(2) :: Nil)
+    }
+  }
+
+  test("cases when literal is min") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MinValue, Float.NegativeInfinity),
+        (3, null, null))
+        .toDF("c1", "c2", "c3").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      var lit = Short.MinValue.toInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Seq.empty)
+
+      checkAnswer(df.select("c1").where(s"c3 > double('-inf')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 >= double('-inf')"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c3 == double('-inf')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 <=> double('-inf')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 != double('-inf')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 <= double('-inf')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 < double('-inf')"), Seq.empty)
+
+      lit = negativeInt

Review comment:
       ditto




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r502558580



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -200,25 +248,27 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
   /**
    * Check if the input `fromExp` can be safely cast to `toType` without any loss of precision,
    * i.e., the conversion is injective. Note this only handles the case when both sides are of
-   * integral type.
+   * numeric type.
    */
   private def canImplicitlyCast(
       fromExp: Expression,
       toType: DataType,
       literalType: DataType): Boolean = {
     toType.sameType(literalType) &&
       !fromExp.foldable &&
-      fromExp.dataType.isInstanceOf[IntegralType] &&
-      toType.isInstanceOf[IntegralType] &&
+      fromExp.dataType.isInstanceOf[NumericType] &&
+      toType.isInstanceOf[NumericType] &&
       Cast.canUpCast(fromExp.dataType, toType)
   }
 
-  private def getRange(dt: DataType): (Any, Any) = dt match {
-    case ByteType => (Byte.MinValue, Byte.MaxValue)
-    case ShortType => (Short.MinValue, Short.MaxValue)
-    case IntegerType => (Int.MinValue, Int.MaxValue)
-    case LongType => (Long.MinValue, Long.MaxValue)
-    case other => throw new IllegalArgumentException(s"Unsupported type: ${other.catalogString}")
+  private def getRange(dt: DataType): Option[(Any, Any)] = dt match {
+    case ByteType => Some((Byte.MinValue, Byte.MaxValue))
+    case ShortType => Some((Short.MinValue, Short.MaxValue))
+    case IntegerType => Some((Int.MinValue, Int.MaxValue))
+    case LongType => Some((Long.MinValue, Long.MaxValue))
+    case FloatType => Some((Float.NegativeInfinity, Float.NaN))
+    case DoubleType => Some((Double.NegativeInfinity, Double.NaN))

Review comment:
       Will add a test case (although I think it will be pretty trivial). I only added tests for `short` in the previous PR because the handling for other integral types is exactly the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r503135640



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  test("cases when literal is max") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MaxValue, Float.NaN), (3, null, null))
+        .toDF("c1", "c2", "c3").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      var lit = Short.MaxValue.toInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Row(1))
+
+      checkAnswer(df.select("c1").where(s"c3 > double('nan')"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c3 >= double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 == double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 <=> double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 != double('nan')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 <= double('nan')"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c3 < double('nan')"), Row(1))
+
+      lit = positiveInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Row(1) :: Row(2) :: Nil)
+    }
+  }
+
+  test("cases when literal is min") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MinValue, Float.NegativeInfinity),
+        (3, null, null))
+        .toDF("c1", "c2", "c3").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      var lit = Short.MinValue.toInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Row(1))

Review comment:
       can we put the `where` before `select`? It's weird to see we select only `c1` and then filter on `c2`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705906525






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705902263


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34172/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705906525






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702300586






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702300586






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707326402


   **[Test build #129704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129704/testReport)** for PR 29792 at commit [`76b9f73`](https://github.com/apache/spark/commit/76b9f73862380dda40b332b56464278bfb0b88d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707713096


   thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707351907


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34310/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai edited a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
dbtsai edited a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-697132376


   cc @viirya @dongjoon-hyun @cloud-fan @gatorsmile @HyukjinKwon @huaxingao 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r493753795



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -79,13 +106,65 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) < v, falseIfNotNull(f))
   }
 
-  test("unwrap casts when literal is within range (min, max)") {
-    assertEquivalent(castInt(f) > 300, f > 300.toShort)
-    assertEquivalent(castInt(f) >= 500, f >= 500.toShort)
-    assertEquivalent(castInt(f) === 32766, f === 32766.toShort)
-    assertEquivalent(castInt(f) <=> 32766, f <=> 32766.toShort)
-    assertEquivalent(castInt(f) <= -6000, f <= -6000.toShort)
-    assertEquivalent(castInt(f) < -32767, f < -32767.toShort)
+  test("unwrap casts when literal is within range (min, max) or fromType has no range") {
+    Seq(300, 500, 32766, -6000, -32767).foreach(v => {
+      assertEquivalent(castInt(f) > v, f > v.toShort)
+      assertEquivalent(castInt(f) >= v, f >= v.toShort)
+      assertEquivalent(castInt(f) === v, f === v.toShort)
+      assertEquivalent(castInt(f) <=> v, f <=> v.toShort)
+      assertEquivalent(castInt(f) <= v, f <= v.toShort)
+      assertEquivalent(castInt(f) < v, f < v.toShort)
+    })
+
+    Seq(3.14.toFloat.toDouble, -1000.0.toFloat.toDouble,
+      20.0.toFloat.toDouble, -2.414.toFloat.toDouble,
+      Float.MinValue.toDouble, Float.MaxValue.toDouble, Float.PositiveInfinity.toDouble
+    ).foreach(v => {
+      assertEquivalent(castDouble(f2) > v, f2 > v.toFloat)
+      assertEquivalent(castDouble(f2) >= v, f2 >= v.toFloat)
+      assertEquivalent(castDouble(f2) === v, f2 === v.toFloat)
+      assertEquivalent(castDouble(f2) <=> v, f2 <=> v.toFloat)
+      assertEquivalent(castDouble(f2) <= v, f2 <= v.toFloat)
+      assertEquivalent(castDouble(f2) < v, f2 < v.toFloat)
+    })
+
+    Seq(decimal2(100.20), decimal2(-200.50)).foreach(v => {
+      assertEquivalent(castDecimal2(f3) > v, f3 > decimal(v))
+      assertEquivalent(castDecimal2(f3) >= v, f3 >= decimal(v))
+      assertEquivalent(castDecimal2(f3) === v, f3 === decimal(v))
+      assertEquivalent(castDecimal2(f3) <=> v, f3 <=> decimal(v))
+      assertEquivalent(castDecimal2(f3) <= v, f3 <= decimal(v))
+      assertEquivalent(castDecimal2(f3) < v, f3 < decimal(v))
+    })
+  }
+
+  test("unwrap cast when literal is within range (min, max) AND has round up or down") {
+    // Cases for rounding down
+    var doubleValue = 100.6
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) === doubleValue, falseIfNotNull(f))
+    assertEquivalent(castDouble(f) <=> doubleValue, false)
+    assertEquivalent(castDouble(f) <= doubleValue, f <= doubleValue.toShort)
+    assertEquivalent(castDouble(f) < doubleValue, f <= doubleValue.toShort)
+
+    // Cases for rounding up: 3.14 will be rounded to 3.14000010... after casting to float

Review comment:
       @cloud-fan Sorry i was wrong in the above comment (somehow I was thinking casting from double to short there). 
   
   Yes, it appears that casting from double to float can be either rounding up or down, depending on value:
   
   ```scala
   scala> val x = 0.39999999
   x: Double = 0.39999999
   
   scala> val y = x.toFloat
   y: Float = 0.39999998
   
   scala > val x = 0.49999999
   y: Double = 0.49999999
   
   scala> val y = x.toFloat
   y: Float = 0.5
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707448379


   Addressed comments. Please take another look @cloud-fan @maropu . Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705960336






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705906525






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707434001


   **[Test build #129704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129704/testReport)** for PR 29792 at commit [`76b9f73`](https://github.com/apache/spark/commit/76b9f73862380dda40b332b56464278bfb0b88d3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r500247652



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +128,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we
+    // optimize by moving the cast to the literal side.
+
+    val newValue = Cast(Literal(value), fromType).eval()
+    if (newValue == null) {
+      // This means the cast failed, for instance, due to the value is not representable in the
+      // narrower type. In this case we simply return the original expression.

Review comment:
       I see, it's for decimal only. It's better to make the comment more explicit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496547966



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we
+    // optimize by moving the cast to the literal side.
+
+    val newValue = Cast(Literal(value), fromType).eval()
+    if (newValue == null) {
+      // This means the cast failed, for instance, due to the value is not representable in the
+      // narrower type. In this case we simply return the original expression.
+      return exp
+    }
+    val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()

Review comment:
       The case I'm worried about is `cast(float_col as double) cmp double_lit`. It's not straightforward to me that a `double -> float -> double` roundtrip can tell rounding up or down. is it because `float -> double` can only be rounding up?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dbtsai commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
dbtsai commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-697132376


   cc @viirya @dongjoon-hyun @cloud-fan @gatorsmile 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r502238574



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -35,18 +35,32 @@ import org.apache.spark.sql.types._
  * to be optimized away later and pushed down to data sources.
  *
  * Currently this only handles cases where:
- *   1). `fromType` (of `fromExp`) and `toType` are of integral types (i.e., byte, short, int and
- *     long)
+ *   1). `fromType` (of `fromExp`) and `toType` are of numeric types (i.e., short, int, float,
+ *     decimal, etc)
  *   2). `fromType` can be safely coerced to `toType` without precision loss (e.g., short to int,
  *     int to long, but not long to int)
  *
  * If the above conditions are satisfied, the rule checks to see if the literal `value` is within
  * range `(min, max)`, where `min` and `max` are the minimum and maximum value of `fromType`,
- * respectively. If this is true then it means we can safely cast `value` to `fromType` and thus
+ * respectively. If this is true then it means we may safely cast `value` to `fromType` and thus
  * able to move the cast to the literal side. That is:
  *
  *   `cast(fromExp, toType) op value` ==> `fromExp op cast(value, fromType)`
  *
+ * Note there are some exceptions to the above: if casting from `value` to `fromType` causes
+ * rounding up or down, the above conversion will no longer be valid. Instead, the rule does the
+ * following:
+ *
+ * if casting `value` to `fromType` causes rounding up:
+ *  - `cast(fromExp, toType) > value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) >= value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) === value` ==> if(isnull(fromExp), null, false)
+ *  - `cast(fromExp, toType) <=> value` ==> false (if `fromExp` is deterministic)
+ *  - `cast(fromExp, toType) <= value` ==> `fromExp < cast(value, fromType)`
+ *  - `cast(fromExp, toType) < value` ==> `fromExp < cast(value, fromType)`
+ *
+ *  Similarly for the case when casting `value` to `fromType` causes rounding down.

Review comment:
       nit: wrong indent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705959731


   **[Test build #129566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129566/testReport)** for PR 29792 at commit [`94942bf`](https://github.com/apache/spark/commit/94942bf2042a5c179ce9eda7fb90493636905078).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-704248672


   The patch LGTM. Can we have an end-to-end test suite for it? The current tests prove that the optimized expression tree is what we expect, it's better to have high-level tests to prove that, after optimization, the query still returns corrected result.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705891447


   **[Test build #129566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129566/testReport)** for PR 29792 at commit [`94942bf`](https://github.com/apache/spark/commit/94942bf2042a5c179ce9eda7fb90493636905078).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-694569792






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702263668


   **[Test build #129311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129311/testReport)** for PR 29792 at commit [`3340de4`](https://github.com/apache/spark/commit/3340de447c551ca2c7d0e9a470b99a02a87913b1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496523301



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -35,18 +35,34 @@ import org.apache.spark.sql.types._
  * to be optimized away later and pushed down to data sources.
  *
  * Currently this only handles cases where:
- *   1). `fromType` (of `fromExp`) and `toType` are of integral types (i.e., byte, short, int and
- *     long)
+ *   1). `fromType` (of `fromExp`) and `toType` are of numeric types (i.e., short, int, float,
+ *     decimal, etc)
  *   2). `fromType` can be safely coerced to `toType` without precision loss (e.g., short to int,
  *     int to long, but not long to int)
  *
  * If the above conditions are satisfied, the rule checks to see if the literal `value` is within
  * range `(min, max)`, where `min` and `max` are the minimum and maximum value of `fromType`,
- * respectively. If this is true then it means we can safely cast `value` to `fromType` and thus
+ * respectively. If this is true then it means we may safely cast `value` to `fromType` and thus
  * able to move the cast to the literal side. That is:
  *
  *   `cast(fromExp, toType) op value` ==> `fromExp op cast(value, fromType)`
  *
+ * Note there are some exceptions to the above: if casting from `value` to `fromType` causes
+ * rounding up or down, the above conversion will no longer be valid. Instead, the rule does the
+ * following:
+ *
+ * if casting `value` to `fromType` causes rounding up:
+ *  - `cast(fromExp, toType) > value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) >= value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) === value` ==> if(isnull(fromExp), null, false)
+ *  - `cast(fromExp, toType) <=> value` ==> false (if `fromExp` is deterministic)
+ *  - `cast(fromExp, toType) <=> value` ==> `cast(fromExp, toType) <=> value` (if `fromExp` is

Review comment:
       We can remove this because the rule does nothing for it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496536630



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we

Review comment:
       why it's safe to skip range check for decimal type?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r500436125



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +128,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we
+    // optimize by moving the cast to the literal side.
+
+    val newValue = Cast(Literal(value), fromType).eval()
+    if (newValue == null) {
+      // This means the cast failed, for instance, due to the value is not representable in the
+      // narrower type. In this case we simply return the original expression.

Review comment:
       yup will do - there is also a test case covering this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r500138374



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +128,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we
+    // optimize by moving the cast to the literal side.
+
+    val newValue = Cast(Literal(value), fromType).eval()
+    if (newValue == null) {
+      // This means the cast failed, for instance, due to the value is not representable in the
+      // narrower type. In this case we simply return the original expression.

Review comment:
       can you give a real example here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r500243158



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we

Review comment:
       is it only useful to `NaN`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r493726234



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -67,6 +77,23 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) <=> v.toInt, f <=> v)
     assertEquivalent(castInt(f) <= v.toInt, f === v)
     assertEquivalent(castInt(f) < v.toInt, falseIfNotNull(f))
+
+    val d = Float.NegativeInfinity
+    assertEquivalent(castDouble(f2) > d.toDouble, f2 =!= d)

Review comment:
       it is rounding down, see below for a test on this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496529759



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -200,25 +252,27 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
   /**
    * Check if the input `fromExp` can be safely cast to `toType` without any loss of precision,
    * i.e., the conversion is injective. Note this only handles the case when both sides are of
-   * integral type.
+   * numeric type.
    */
   private def canImplicitlyCast(
       fromExp: Expression,
       toType: DataType,
       literalType: DataType): Boolean = {
     toType.sameType(literalType) &&
       !fromExp.foldable &&
-      fromExp.dataType.isInstanceOf[IntegralType] &&
-      toType.isInstanceOf[IntegralType] &&
+      fromExp.dataType.isInstanceOf[NumericType] &&
+      toType.isInstanceOf[NumericType] &&
       Cast.canUpCast(fromExp.dataType, toType)
   }
 
-  private def getRange(dt: DataType): (Any, Any) = dt match {
-    case ByteType => (Byte.MinValue, Byte.MaxValue)
-    case ShortType => (Short.MinValue, Short.MaxValue)
-    case IntegerType => (Int.MinValue, Int.MaxValue)
-    case LongType => (Long.MinValue, Long.MaxValue)
-    case other => throw new IllegalArgumentException(s"Unsupported type: ${other.catalogString}")
+  private def getRange(dt: DataType): Option[(Any, Any)] = dt match {
+    case ByteType => Some((Byte.MinValue, Byte.MaxValue))
+    case ShortType => Some((Short.MinValue, Short.MaxValue))
+    case IntegerType => Some((Int.MinValue, Int.MaxValue))
+    case LongType => Some((Long.MinValue, Long.MaxValue))
+    case FloatType => Some((Float.NegativeInfinity, Float.NaN))

Review comment:
       why the upper bound is not `PositiveInfinity`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705906525






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-694424931


   **[Test build #128838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128838/testReport)** for PR 29792 at commit [`65833e7`](https://github.com/apache/spark/commit/65833e7d38a98495ec898bae2ec33adef7054aa5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707351927






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705891447






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707326402


   **[Test build #129704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129704/testReport)** for PR 29792 at commit [`76b9f73`](https://github.com/apache/spark/commit/76b9f73862380dda40b332b56464278bfb0b88d3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r503134929



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  test("cases when literal is max") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MaxValue, Float.NaN), (3, null, null))
+        .toDF("c1", "c2", "c3").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      var lit = Short.MaxValue.toInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Row(1))
+
+      checkAnswer(df.select("c1").where(s"c3 > double('nan')"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c3 >= double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 == double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 <=> double('nan')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 != double('nan')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 <= double('nan')"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c3 < double('nan')"), Row(1))
+
+      lit = positiveInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Row(1) :: Row(2) :: Nil)
+    }
+  }
+
+  test("cases when literal is min") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MinValue, Float.NegativeInfinity),
+        (3, null, null))
+        .toDF("c1", "c2", "c3").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      var lit = Short.MinValue.toInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1))
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Row(2))
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Seq.empty)
+
+      checkAnswer(df.select("c1").where(s"c3 > double('-inf')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 >= double('-inf')"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c3 == double('-inf')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 <=> double('-inf')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 != double('-inf')"), Row(1))
+      checkAnswer(df.select("c1").where(s"c3 <= double('-inf')"), Row(2))
+      checkAnswer(df.select("c1").where(s"c3 < double('-inf')"), Seq.empty)
+
+      lit = negativeInt
+      checkAnswer(df.select("c1").where(s"c2 > $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 >= $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 == $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 <=> $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 != $lit"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where(s"c2 <= $lit"), Seq.empty)
+      checkAnswer(df.select("c1").where(s"c2 < $lit"), Seq.empty)
+    }
+  }
+
+  test("cases when literal is within range (min, max)") {
+    val t = "test_table"
+    withTable(t) {
+      Seq((1, 300.toShort), (2, 500.toShort)).toDF("c1", "c2").write.saveAsTable(t)
+      val df = spark.table(t)
+
+      checkAnswer(df.select("c1").where("c2 < 200"), Seq.empty)
+      checkAnswer(df.select("c1").where("c2 < 400"), Row(1) :: Nil)
+      checkAnswer(df.select("c1").where("c2 < 600"), Row(1) :: Row(2) :: Nil)
+
+      checkAnswer(df.select("c1").where("c2 <= 100"), Seq.empty)
+      checkAnswer(df.select("c1").where("c2 <= 300"), Row(1) :: Nil)
+      checkAnswer(df.select("c1").where("c2 <= 500"), Row(1) :: Row(2) :: Nil)
+
+      checkAnswer(df.select("c1").where("c2 == 100"), Seq.empty)
+      checkAnswer(df.select("c1").where("c2 == 300"), Row(1) :: Nil)
+      checkAnswer(df.select("c1").where("c2 == 500"), Row(2) :: Nil)
+
+      checkAnswer(df.select("c1").where("c2 <=> 100"), Seq.empty)
+      checkAnswer(df.select("c1").where("c2 <=> 300"), Row(1) :: Nil)
+      checkAnswer(df.select("c1").where("c2 <=> 500"), Row(2) :: Nil)
+      checkAnswer(df.select("c1").where("c2 <=> null"), Seq.empty)
+
+      checkAnswer(df.select("c1").where("c2 >= 200"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where("c2 >= 400"), Row(2) :: Nil)
+      checkAnswer(df.select("c1").where("c2 >= 600"), Seq.empty)
+
+      checkAnswer(df.select("c1").where("c2 > 100"), Row(1) :: Row(2) :: Nil)
+      checkAnswer(df.select("c1").where("c2 > 300"), Row(2) :: Nil)
+      checkAnswer(df.select("c1").where("c2 > 500"), Seq.empty)
+    }
+  }
+
+  test("cases when literal is within range (min, max) and rounding for ") {

Review comment:
       `rounding for `: uncompleted sentence?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-694425595






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r495279771



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -79,13 +106,65 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) < v, falseIfNotNull(f))
   }
 
-  test("unwrap casts when literal is within range (min, max)") {
-    assertEquivalent(castInt(f) > 300, f > 300.toShort)
-    assertEquivalent(castInt(f) >= 500, f >= 500.toShort)
-    assertEquivalent(castInt(f) === 32766, f === 32766.toShort)
-    assertEquivalent(castInt(f) <=> 32766, f <=> 32766.toShort)
-    assertEquivalent(castInt(f) <= -6000, f <= -6000.toShort)
-    assertEquivalent(castInt(f) < -32767, f < -32767.toShort)
+  test("unwrap casts when literal is within range (min, max) or fromType has no range") {
+    Seq(300, 500, 32766, -6000, -32767).foreach(v => {
+      assertEquivalent(castInt(f) > v, f > v.toShort)
+      assertEquivalent(castInt(f) >= v, f >= v.toShort)
+      assertEquivalent(castInt(f) === v, f === v.toShort)
+      assertEquivalent(castInt(f) <=> v, f <=> v.toShort)
+      assertEquivalent(castInt(f) <= v, f <= v.toShort)
+      assertEquivalent(castInt(f) < v, f < v.toShort)
+    })
+
+    Seq(3.14.toFloat.toDouble, -1000.0.toFloat.toDouble,
+      20.0.toFloat.toDouble, -2.414.toFloat.toDouble,
+      Float.MinValue.toDouble, Float.MaxValue.toDouble, Float.PositiveInfinity.toDouble
+    ).foreach(v => {
+      assertEquivalent(castDouble(f2) > v, f2 > v.toFloat)
+      assertEquivalent(castDouble(f2) >= v, f2 >= v.toFloat)
+      assertEquivalent(castDouble(f2) === v, f2 === v.toFloat)
+      assertEquivalent(castDouble(f2) <=> v, f2 <=> v.toFloat)
+      assertEquivalent(castDouble(f2) <= v, f2 <= v.toFloat)
+      assertEquivalent(castDouble(f2) < v, f2 < v.toFloat)
+    })
+
+    Seq(decimal2(100.20), decimal2(-200.50)).foreach(v => {
+      assertEquivalent(castDecimal2(f3) > v, f3 > decimal(v))
+      assertEquivalent(castDecimal2(f3) >= v, f3 >= decimal(v))
+      assertEquivalent(castDecimal2(f3) === v, f3 === decimal(v))
+      assertEquivalent(castDecimal2(f3) <=> v, f3 <=> decimal(v))
+      assertEquivalent(castDecimal2(f3) <= v, f3 <= decimal(v))
+      assertEquivalent(castDecimal2(f3) < v, f3 < decimal(v))
+    })
+  }
+
+  test("unwrap cast when literal is within range (min, max) AND has round up or down") {
+    // Cases for rounding down
+    var doubleValue = 100.6
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) === doubleValue, falseIfNotNull(f))
+    assertEquivalent(castDouble(f) <=> doubleValue, false)
+    assertEquivalent(castDouble(f) <= doubleValue, f <= doubleValue.toShort)
+    assertEquivalent(castDouble(f) < doubleValue, f <= doubleValue.toShort)
+
+    // Cases for rounding up: 3.14 will be rounded to 3.14000010... after casting to float

Review comment:
       Yup, will do. This is a good point.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707345254


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34310/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707434624






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r493731636



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -79,13 +106,65 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) < v, falseIfNotNull(f))
   }
 
-  test("unwrap casts when literal is within range (min, max)") {
-    assertEquivalent(castInt(f) > 300, f > 300.toShort)
-    assertEquivalent(castInt(f) >= 500, f >= 500.toShort)
-    assertEquivalent(castInt(f) === 32766, f === 32766.toShort)
-    assertEquivalent(castInt(f) <=> 32766, f <=> 32766.toShort)
-    assertEquivalent(castInt(f) <= -6000, f <= -6000.toShort)
-    assertEquivalent(castInt(f) < -32767, f < -32767.toShort)
+  test("unwrap casts when literal is within range (min, max) or fromType has no range") {
+    Seq(300, 500, 32766, -6000, -32767).foreach(v => {
+      assertEquivalent(castInt(f) > v, f > v.toShort)
+      assertEquivalent(castInt(f) >= v, f >= v.toShort)
+      assertEquivalent(castInt(f) === v, f === v.toShort)
+      assertEquivalent(castInt(f) <=> v, f <=> v.toShort)
+      assertEquivalent(castInt(f) <= v, f <= v.toShort)
+      assertEquivalent(castInt(f) < v, f < v.toShort)
+    })
+
+    Seq(3.14.toFloat.toDouble, -1000.0.toFloat.toDouble,
+      20.0.toFloat.toDouble, -2.414.toFloat.toDouble,
+      Float.MinValue.toDouble, Float.MaxValue.toDouble, Float.PositiveInfinity.toDouble
+    ).foreach(v => {
+      assertEquivalent(castDouble(f2) > v, f2 > v.toFloat)
+      assertEquivalent(castDouble(f2) >= v, f2 >= v.toFloat)
+      assertEquivalent(castDouble(f2) === v, f2 === v.toFloat)
+      assertEquivalent(castDouble(f2) <=> v, f2 <=> v.toFloat)
+      assertEquivalent(castDouble(f2) <= v, f2 <= v.toFloat)
+      assertEquivalent(castDouble(f2) < v, f2 < v.toFloat)
+    })
+
+    Seq(decimal2(100.20), decimal2(-200.50)).foreach(v => {
+      assertEquivalent(castDecimal2(f3) > v, f3 > decimal(v))
+      assertEquivalent(castDecimal2(f3) >= v, f3 >= decimal(v))
+      assertEquivalent(castDecimal2(f3) === v, f3 === decimal(v))
+      assertEquivalent(castDecimal2(f3) <=> v, f3 <=> decimal(v))
+      assertEquivalent(castDecimal2(f3) <= v, f3 <= decimal(v))
+      assertEquivalent(castDecimal2(f3) < v, f3 < decimal(v))
+    })
+  }
+
+  test("unwrap cast when literal is within range (min, max) AND has round up or down") {
+    // Cases for rounding down
+    var doubleValue = 100.6
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) === doubleValue, falseIfNotNull(f))
+    assertEquivalent(castDouble(f) <=> doubleValue, false)
+    assertEquivalent(castDouble(f) <= doubleValue, f <= doubleValue.toShort)
+    assertEquivalent(castDouble(f) < doubleValue, f <= doubleValue.toShort)
+
+    // Cases for rounding up: 3.14 will be rounded to 3.14000010... after casting to float

Review comment:
       so casting double to float can be either rounding up or down, depend on the value?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-707434624






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r500137824



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we

Review comment:
       makes sense. Can you give an example of how the range of float/double helps the optimization?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702431064


   **[Test build #129311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129311/testReport)** for PR 29792 at commit [`3340de4`](https://github.com/apache/spark/commit/3340de447c551ca2c7d0e9a470b99a02a87913b1).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r493753795



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -79,13 +106,65 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) < v, falseIfNotNull(f))
   }
 
-  test("unwrap casts when literal is within range (min, max)") {
-    assertEquivalent(castInt(f) > 300, f > 300.toShort)
-    assertEquivalent(castInt(f) >= 500, f >= 500.toShort)
-    assertEquivalent(castInt(f) === 32766, f === 32766.toShort)
-    assertEquivalent(castInt(f) <=> 32766, f <=> 32766.toShort)
-    assertEquivalent(castInt(f) <= -6000, f <= -6000.toShort)
-    assertEquivalent(castInt(f) < -32767, f < -32767.toShort)
+  test("unwrap casts when literal is within range (min, max) or fromType has no range") {
+    Seq(300, 500, 32766, -6000, -32767).foreach(v => {
+      assertEquivalent(castInt(f) > v, f > v.toShort)
+      assertEquivalent(castInt(f) >= v, f >= v.toShort)
+      assertEquivalent(castInt(f) === v, f === v.toShort)
+      assertEquivalent(castInt(f) <=> v, f <=> v.toShort)
+      assertEquivalent(castInt(f) <= v, f <= v.toShort)
+      assertEquivalent(castInt(f) < v, f < v.toShort)
+    })
+
+    Seq(3.14.toFloat.toDouble, -1000.0.toFloat.toDouble,
+      20.0.toFloat.toDouble, -2.414.toFloat.toDouble,
+      Float.MinValue.toDouble, Float.MaxValue.toDouble, Float.PositiveInfinity.toDouble
+    ).foreach(v => {
+      assertEquivalent(castDouble(f2) > v, f2 > v.toFloat)
+      assertEquivalent(castDouble(f2) >= v, f2 >= v.toFloat)
+      assertEquivalent(castDouble(f2) === v, f2 === v.toFloat)
+      assertEquivalent(castDouble(f2) <=> v, f2 <=> v.toFloat)
+      assertEquivalent(castDouble(f2) <= v, f2 <= v.toFloat)
+      assertEquivalent(castDouble(f2) < v, f2 < v.toFloat)
+    })
+
+    Seq(decimal2(100.20), decimal2(-200.50)).foreach(v => {
+      assertEquivalent(castDecimal2(f3) > v, f3 > decimal(v))
+      assertEquivalent(castDecimal2(f3) >= v, f3 >= decimal(v))
+      assertEquivalent(castDecimal2(f3) === v, f3 === decimal(v))
+      assertEquivalent(castDecimal2(f3) <=> v, f3 <=> decimal(v))
+      assertEquivalent(castDecimal2(f3) <= v, f3 <= decimal(v))
+      assertEquivalent(castDecimal2(f3) < v, f3 < decimal(v))
+    })
+  }
+
+  test("unwrap cast when literal is within range (min, max) AND has round up or down") {
+    // Cases for rounding down
+    var doubleValue = 100.6
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) === doubleValue, falseIfNotNull(f))
+    assertEquivalent(castDouble(f) <=> doubleValue, false)
+    assertEquivalent(castDouble(f) <= doubleValue, f <= doubleValue.toShort)
+    assertEquivalent(castDouble(f) < doubleValue, f <= doubleValue.toShort)
+
+    // Cases for rounding up: 3.14 will be rounded to 3.14000010... after casting to float

Review comment:
       @cloud-fan Sorry i was wrong in the above comment (somehow I was thinking casting from double to short there). 
   
   Yes, it appears that casting from double to float can be either rounding up or down, depending on value:
   
   ```scala
   scala> val x = 0.39999999
   x: Double = 0.39999999
   
   scala> val y = x.toFloat
   y: Float = 0.39999998
   
   scala > val x = 0.49999999
   y: Double = 0.49999999
   
   scala> val y = x.toFloat
   y: Float = 0.5
   ```
   To clarify, here the round up is after casting roundtrip: double 3.14 -> float 3.14 -> double 3.14000010...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705960336






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r503132659



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.expressions.IntegralLiteralTestUtils.{negativeInt, positiveInt}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.Decimal
+
+class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  test("cases when literal is max") {
+    val t = "test_table"
+    withTable(t) {
+      Seq[(Integer, java.lang.Short, java.lang.Float)](
+        (1, 100.toShort, 3.14.toFloat), (2, Short.MaxValue, Float.NaN), (3, null, null))

Review comment:
       can we test `Float.PositiveInfinity` as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496275562



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala
##########
@@ -79,13 +106,65 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp
     assertEquivalent(castInt(f) < v, falseIfNotNull(f))
   }
 
-  test("unwrap casts when literal is within range (min, max)") {
-    assertEquivalent(castInt(f) > 300, f > 300.toShort)
-    assertEquivalent(castInt(f) >= 500, f >= 500.toShort)
-    assertEquivalent(castInt(f) === 32766, f === 32766.toShort)
-    assertEquivalent(castInt(f) <=> 32766, f <=> 32766.toShort)
-    assertEquivalent(castInt(f) <= -6000, f <= -6000.toShort)
-    assertEquivalent(castInt(f) < -32767, f < -32767.toShort)
+  test("unwrap casts when literal is within range (min, max) or fromType has no range") {
+    Seq(300, 500, 32766, -6000, -32767).foreach(v => {
+      assertEquivalent(castInt(f) > v, f > v.toShort)
+      assertEquivalent(castInt(f) >= v, f >= v.toShort)
+      assertEquivalent(castInt(f) === v, f === v.toShort)
+      assertEquivalent(castInt(f) <=> v, f <=> v.toShort)
+      assertEquivalent(castInt(f) <= v, f <= v.toShort)
+      assertEquivalent(castInt(f) < v, f < v.toShort)
+    })
+
+    Seq(3.14.toFloat.toDouble, -1000.0.toFloat.toDouble,
+      20.0.toFloat.toDouble, -2.414.toFloat.toDouble,
+      Float.MinValue.toDouble, Float.MaxValue.toDouble, Float.PositiveInfinity.toDouble
+    ).foreach(v => {
+      assertEquivalent(castDouble(f2) > v, f2 > v.toFloat)
+      assertEquivalent(castDouble(f2) >= v, f2 >= v.toFloat)
+      assertEquivalent(castDouble(f2) === v, f2 === v.toFloat)
+      assertEquivalent(castDouble(f2) <=> v, f2 <=> v.toFloat)
+      assertEquivalent(castDouble(f2) <= v, f2 <= v.toFloat)
+      assertEquivalent(castDouble(f2) < v, f2 < v.toFloat)
+    })
+
+    Seq(decimal2(100.20), decimal2(-200.50)).foreach(v => {
+      assertEquivalent(castDecimal2(f3) > v, f3 > decimal(v))
+      assertEquivalent(castDecimal2(f3) >= v, f3 >= decimal(v))
+      assertEquivalent(castDecimal2(f3) === v, f3 === decimal(v))
+      assertEquivalent(castDecimal2(f3) <=> v, f3 <=> decimal(v))
+      assertEquivalent(castDecimal2(f3) <= v, f3 <= decimal(v))
+      assertEquivalent(castDecimal2(f3) < v, f3 < decimal(v))
+    })
+  }
+
+  test("unwrap cast when literal is within range (min, max) AND has round up or down") {
+    // Cases for rounding down
+    var doubleValue = 100.6
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) > doubleValue, f > doubleValue.toShort)
+    assertEquivalent(castDouble(f) === doubleValue, falseIfNotNull(f))
+    assertEquivalent(castDouble(f) <=> doubleValue, false)
+    assertEquivalent(castDouble(f) <= doubleValue, f <= doubleValue.toShort)
+    assertEquivalent(castDouble(f) < doubleValue, f <= doubleValue.toShort)
+
+    // Cases for rounding up: 3.14 will be rounded to 3.14000010... after casting to float

Review comment:
       @cloud-fan updated the description. Please take another look. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-702431932






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496883121



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we
+    // optimize by moving the cast to the literal side.
+
+    val newValue = Cast(Literal(value), fromType).eval()
+    if (newValue == null) {
+      // This means the cast failed, for instance, due to the value is not representable in the
+      // narrower type. In this case we simply return the original expression.
+      return exp
+    }
+    val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()

Review comment:
       So double to float can result to either rounding up or down. For instance, by casting 3.14 in double to float, even though the value is still 3.14, the binary representation is rounded up:
   
   3.14 in double:
   ```
   0 10000000000 1001 0001 1110 1011 1000 0101 0001 1110 1011 1000 0101 0001 1111
   ```
   
   3.14 in float
   ```
   0 10000000 1001 0001 1110 1011 1000 011
   ```
   Here the sign bit and exponent bits (11 and 8 bits respectively for double and float) are the same for both float and double. However, in the fraction part, the last is rounded up to 1.
   
   After casting back to double, there won't be any rounding up or down - the remaining digits are simply padded with 0:
   ```
   0 10000000000 1001 0001 1110 1011 1000 0110 0000000000000000000000000000
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705891447


   **[Test build #129566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129566/testReport)** for PR 29792 at commit [`94942bf`](https://github.com/apache/spark/commit/94942bf2042a5c179ce9eda7fb90493636905078).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-703753898


   ping @cloud-fan - addressed your comments, could you take another look at this? thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496924894



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -200,25 +252,27 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
   /**
    * Check if the input `fromExp` can be safely cast to `toType` without any loss of precision,
    * i.e., the conversion is injective. Note this only handles the case when both sides are of
-   * integral type.
+   * numeric type.
    */
   private def canImplicitlyCast(
       fromExp: Expression,
       toType: DataType,
       literalType: DataType): Boolean = {
     toType.sameType(literalType) &&
       !fromExp.foldable &&
-      fromExp.dataType.isInstanceOf[IntegralType] &&
-      toType.isInstanceOf[IntegralType] &&
+      fromExp.dataType.isInstanceOf[NumericType] &&
+      toType.isInstanceOf[NumericType] &&
       Cast.canUpCast(fromExp.dataType, toType)
   }
 
-  private def getRange(dt: DataType): (Any, Any) = dt match {
-    case ByteType => (Byte.MinValue, Byte.MaxValue)
-    case ShortType => (Short.MinValue, Short.MaxValue)
-    case IntegerType => (Int.MinValue, Int.MaxValue)
-    case LongType => (Long.MinValue, Long.MaxValue)
-    case other => throw new IllegalArgumentException(s"Unsupported type: ${other.catalogString}")
+  private def getRange(dt: DataType): Option[(Any, Any)] = dt match {
+    case ByteType => Some((Byte.MinValue, Byte.MaxValue))
+    case ShortType => Some((Short.MinValue, Short.MaxValue))
+    case IntegerType => Some((Int.MinValue, Int.MaxValue))
+    case LongType => Some((Long.MinValue, Long.MaxValue))
+    case FloatType => Some((Float.NegativeInfinity, Float.NaN))

Review comment:
       This is because `PositiveInfinity` is considered to be < `NaN` in Spark. If we treat it as the upper bound, rules handling the upper bounds will not be valid. For instance the following expr:
   ```sql
   cast(e as double) > double('+inf')
   ```
   would be converted to
   ```sql
   e === double('+inf')
   ```
   which won't be correct if `e` evaluates to `double('NaN')`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-694424931


   **[Test build #128838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128838/testReport)** for PR 29792 at commit [`65833e7`](https://github.com/apache/spark/commit/65833e7d38a98495ec898bae2ec33adef7054aa5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705906512


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34172/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r500137824



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we

Review comment:
       makes sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-694568961


   **[Test build #128838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128838/testReport)** for PR 29792 at commit [`65833e7`](https://github.com/apache/spark/commit/65833e7d38a98495ec898bae2ec33adef7054aa5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29792: [WIP][SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-694425595






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496883941



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -35,18 +35,34 @@ import org.apache.spark.sql.types._
  * to be optimized away later and pushed down to data sources.
  *
  * Currently this only handles cases where:
- *   1). `fromType` (of `fromExp`) and `toType` are of integral types (i.e., byte, short, int and
- *     long)
+ *   1). `fromType` (of `fromExp`) and `toType` are of numeric types (i.e., short, int, float,
+ *     decimal, etc)
  *   2). `fromType` can be safely coerced to `toType` without precision loss (e.g., short to int,
  *     int to long, but not long to int)
  *
  * If the above conditions are satisfied, the rule checks to see if the literal `value` is within
  * range `(min, max)`, where `min` and `max` are the minimum and maximum value of `fromType`,
- * respectively. If this is true then it means we can safely cast `value` to `fromType` and thus
+ * respectively. If this is true then it means we may safely cast `value` to `fromType` and thus
  * able to move the cast to the literal side. That is:
  *
  *   `cast(fromExp, toType) op value` ==> `fromExp op cast(value, fromType)`
  *
+ * Note there are some exceptions to the above: if casting from `value` to `fromType` causes
+ * rounding up or down, the above conversion will no longer be valid. Instead, the rule does the
+ * following:
+ *
+ * if casting `value` to `fromType` causes rounding up:
+ *  - `cast(fromExp, toType) > value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) >= value` ==> `fromExp >= cast(value, fromType)`
+ *  - `cast(fromExp, toType) === value` ==> if(isnull(fromExp), null, false)
+ *  - `cast(fromExp, toType) <=> value` ==> false (if `fromExp` is deterministic)
+ *  - `cast(fromExp, toType) <=> value` ==> `cast(fromExp, toType) <=> value` (if `fromExp` is

Review comment:
       Thanks, will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #29792:
URL: https://github.com/apache/spark/pull/29792#discussion_r496926616



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##########
@@ -116,82 +132,118 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
    * optimizes the expression by moving the cast to the literal side. Otherwise if result is not
    * true, this replaces the input binary comparison `exp` with simpler expressions.
    */
-  private def simplifyIntegralComparison(
+  private def simplifyNumericComparison(
       exp: BinaryComparison,
       fromExp: Expression,
-      toType: IntegralType,
+      toType: NumericType,
       value: Any): Expression = {
 
     val fromType = fromExp.dataType
-    val (min, max) = getRange(fromType)
-    val (minInToType, maxInToType) = {
-      (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minCmp = ordering.compare(value, minInToType)
-    val maxCmp = ordering.compare(value, maxInToType)
+    val range = getRange(fromType)
 
-    if (maxCmp > 0) {
-      exp match {
-        case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThan(_, _) | LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        // make sure the expression is evaluated if it is non-deterministic
-        case EqualNullSafe(_, _) if exp.deterministic =>
-          FalseLiteral
-        case _ => exp
+    if (range.isDefined) {
+      val (min, max) = range.get
+      val (minInToType, maxInToType) = {
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
       }
-    } else if (maxCmp == 0) {
-      exp match {
-        case GreaterThan(_, _) =>
-          falseIfNotNull(fromExp)
-        case LessThanOrEqual(_, _) =>
-          trueIfNotNull(fromExp)
-        case LessThan(_, _) =>
-          Not(EqualTo(fromExp, Literal(max, fromType)))
-        case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
-          EqualTo(fromExp, Literal(max, fromType))
-        case EqualNullSafe(_, _) =>
-          EqualNullSafe(fromExp, Literal(max, fromType))
-        case _ => exp
+      val minCmp = ordering.compare(value, minInToType)
+      val maxCmp = ordering.compare(value, maxInToType)
+
+      if (maxCmp >= 0 || minCmp <= 0) {
+        return if (maxCmp > 0) {
+          exp match {
+            case EqualTo(_, _) | GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else if (maxCmp == 0) {
+          exp match {
+            case GreaterThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case LessThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(max, fromType)))
+            case GreaterThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(max, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(max, fromType))
+            case _ => exp
+          }
+        } else if (minCmp < 0) {
+          exp match {
+            case GreaterThan(_, _) | GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case LessThan(_, _) | LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              falseIfNotNull(fromExp)
+            // make sure the expression is evaluated if it is non-deterministic
+            case EqualNullSafe(_, _) if exp.deterministic =>
+              FalseLiteral
+            case _ => exp
+          }
+        } else { // minCmp == 0
+          exp match {
+            case LessThan(_, _) =>
+              falseIfNotNull(fromExp)
+            case GreaterThanOrEqual(_, _) =>
+              trueIfNotNull(fromExp)
+            case GreaterThan(_, _) =>
+              Not(EqualTo(fromExp, Literal(min, fromType)))
+            case LessThanOrEqual(_, _) | EqualTo(_, _) =>
+              EqualTo(fromExp, Literal(min, fromType))
+            case EqualNullSafe(_, _) =>
+              EqualNullSafe(fromExp, Literal(min, fromType))
+            case _ => exp
+          }
+        }
       }
-    } else if (minCmp < 0) {
+    }
+
+    // When we reach to this point, it means either there is no min/max for the `fromType` (e.g.,
+    // decimal type), or that the literal `value` is within range `(min, max)`. For these, we

Review comment:
       It is safe since knowing min/max for a type just gives us more opportunity for optimizations. I skipped decimal type here because (it seems) there is no min/max defined in the `DecimalType`, unlike other numeric types.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705960760


   @cloud-fan added an e2e test suite. Please take another look, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29792: [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric types

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29792:
URL: https://github.com/apache/spark/pull/29792#issuecomment-705891447


   **[Test build #129566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129566/testReport)** for PR 29792 at commit [`94942bf`](https://github.com/apache/spark/commit/94942bf2042a5c179ce9eda7fb90493636905078).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org