You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/08 12:51:25 UTC

[GitHub] [spark] cfmcgrady opened a new pull request, #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

cfmcgrady opened a new pull request, #37439:
URL: https://github.com/apache/spark/pull/37439

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   This PR aims to fix the case
   
   ```scala
   sql("create table t1(a decimal(3, 0)) using parquet")
   sql("insert into t1 values(100), (10), (1)")
   sql("select * from t1 where a in(100000, 1.00)").show
   ```
   
   ```
   java.lang.RuntimeException: After applying rule org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison in batch Operator Optimization before Inferring Filters, the structural integrity of the plan is broken.
   	at org.apache.spark.sql.errors.QueryExecutionErrors$.structuralIntegrityIsBrokenAfterApplyingRuleError(QueryExecutionErrors.scala:1325)
   ```
   
   1. the rule `UnwrapCastInBinaryComparison` transforms the expression `In` to Equals
   
   ```
   CAST(a as decimal(12,2)) IN (100000.00,1.00)
   
   
   OR(
      CAST(a as decimal(12,2)) = 100000.00,
      CAST(a as decimal(12,2)) = 1.00
   )
   ```
   
   
   2. using `UnwrapCastInBinaryComparison.unwrapCast()` to optimize each `EqualTo`
   
   ```
   // Expression1
   CAST(a as decimal(12,2)) = 100000.00 => CAST(a as decimal(12,2)) = 100000.00
   
   // Expression2
   CAST(a as decimal(12,2)) = 1.00      => a = 1
   ```
   
   3. return the new unwrapped cast expression `In`
   
   ```
   a IN (100000.00, 1.00)
   ```
   
   
   Before this PR:
   
   the method `UnwrapCastInBinaryComparison.unwrapCast()` returns the original expression when downcasting to a decimal type fails (the `Expression1`),returns the original expression if the downcast to the decimal type succeeds (the `Expression2`), the two expressions have different data type which would break the structural integrity
   
   ```
   a IN (100000.00, 1.00)
          |           |
       decimal(12, 2) |
                  decimal(3, 0)
   ```
   
   
   After this PR:
   
   the PR transform the downcasting failed expression to `falseIfNotNull(fromExp)`
   ```
   
   ((isnull(a) AND null) OR a IN (1.00)
   ```
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No, only bug fix.
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   Unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
wangyum commented on PR #37439:
URL: https://github.com/apache/spark/pull/37439#issuecomment-1221301948

   cc @cloud-fan @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952366465


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,7 +155,16 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {

Review Comment:
   since `unwrapCast` always returns a non-null value, it seems that these changes don't make the code clearer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r957967773


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,82 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var containsCannotCastLiteral = false
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            containsCannotCastLiteral = true
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              containsCannotCastLiteral = true
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                containsCannotCastLiteral = true
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && !containsCannotCastLiteral) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (!containsCannotCastLiteral) {
+        Option(unwrapExpr)
+      } else {
+        // the list contains a literal that cannot be cast to `fromExp.dataType`
+        Option(Or(falseIfNotNull(fromExp), unwrapExpr))

Review Comment:
   Do you mean we can simply return `unwrapExpr`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952931639


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -156,29 +157,36 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
           unwrapCast(EqualTo(in.value, lit)) match {
-            case EqualTo(_, unwrapLit: Literal) => canCastList += unwrapLit
-            case e @ And(IsNull(_), Literal(null, BooleanType)) => cannotCastList += e
+            // the function `unwrapCast` returns None means the literal can not cast to fromType,
+            // for instance: (the boundreference is of type DECIMAL(5,2))
+            //     CAST(boundreference() AS DECIMAL(10,4)) = 123456.1234BD
+            // Due to `cast(lit, fromExp.dataType) == null` we simply return
+            // `falseIfNotNull(fromExp)`.
+            case None | Some(And(IsNull(_), Literal(null, BooleanType))) =>
+              cannotCastList += falseIfNotNull(fromExp)
+            case Some(EqualTo(_, unwrapLit: Literal)) =>
+              canCastList += unwrapLit
             case _ => throw new IllegalStateException("Illegal unwrap cast result found.")
           }
         case _ => throw new IllegalStateException("Illegal value found in in.list.")
       }
 
       // return original expression when in.list contains only null values.

Review Comment:
   need to update the comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37439:
URL: https://github.com/apache/spark/pull/37439#issuecomment-1208589418

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r955755997


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,85 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   A. the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //   for instance:
+    //       `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   B. the literal value is out of fromType range, for instance:
+    //       `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   C. the literal value is rounded up/down after casting to `fromType`, for instance:
+    //       `(cast(input[1, float, true] as double) = 3.14)`
+    //   note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList, cannotCastList) =
+      (ArrayBuffer[Literal](), ArrayBuffer[Literal](), ArrayBuffer[Expression]())
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            cannotCastList += falseIfNotNull(fromExp)

Review Comment:
   since we only add `falseIfNotNull(fromExp)` to `cannotCastList`, shall we simply have a counter for how many elements in the IN list is not castable?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed
URL: https://github.com/apache/spark/pull/37439


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r958060241


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,82 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var containsCannotCastLiteral = false
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            containsCannotCastLiteral = true
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              containsCannotCastLiteral = true
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                containsCannotCastLiteral = true
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && !containsCannotCastLiteral) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (!containsCannotCastLiteral) {
+        Option(unwrapExpr)
+      } else {
+        // the list contains a literal that cannot be cast to `fromExp.dataType`
+        Option(Or(falseIfNotNull(fromExp), unwrapExpr))

Review Comment:
   return `unwrapExpr` SGTM.
   
   ---
   
   > when the literal is out of fromType range, the EqualTo will be optimized to `falseIfNotNull(fromType)`
   >
   > ```
   > cast(input[0, smallint, true] as int) = 32768  => isnull(input[0, smallint, true]) AND null
   > ```
   
   btw, just for learning, why not return false here? cc @sunchao 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r956251486


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,83 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var cannotCastCounter = 0
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            cannotCastCounter += 1
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              cannotCastCounter += 1
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                cannotCastCounter += 1
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && cannotCastCounter == 0) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (cannotCastCounter == 0) {

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r958617662


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -316,55 +316,25 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     //     note that 3.14 will be rounded to 3.14000010... after casting to float
 
     val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
-    var containsCannotCastLiteral = false
     val fromType = fromExp.dataType
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minMaxInToType = getRange(fromType).map {
-      case (min, max) =>
-        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
 
     list.foreach {
       case lit @ Literal(null, _) => nullList += lit
       case NonNullLiteral(value, _) =>
-        val minMaxCmp = minMaxInToType.map {
-          case (minInToType, maxInToType) =>
-            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
-        }
-        minMaxCmp match {
-          // the literal value is out of fromType range
-          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
-            containsCannotCastLiteral = true
-          case _ =>
-            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
-            if (newValue == null) {
-              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
-              // fromType
-              containsCannotCastLiteral = true
-            } else {
-              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
-              val cmp = ordering.compare(value, valueRoundTrip)
-              if (cmp == 0) {
-                canCastList += Literal(newValue, fromType)
-              } else {
-                // the literal value is rounded up/down after casting to `fromType`
-                containsCannotCastLiteral = true
-              }
-            }
+        val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()

Review Comment:
   does it mean we can always use `Cast` to detect "castable" and the range-related check is not needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r959127060


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -316,55 +316,25 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     //     note that 3.14 will be rounded to 3.14000010... after casting to float
 
     val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
-    var containsCannotCastLiteral = false
     val fromType = fromExp.dataType
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minMaxInToType = getRange(fromType).map {
-      case (min, max) =>
-        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
 
     list.foreach {
       case lit @ Literal(null, _) => nullList += lit
       case NonNullLiteral(value, _) =>
-        val minMaxCmp = minMaxInToType.map {
-          case (minInToType, maxInToType) =>
-            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
-        }
-        minMaxCmp match {
-          // the literal value is out of fromType range
-          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
-            containsCannotCastLiteral = true
-          case _ =>
-            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
-            if (newValue == null) {
-              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
-              // fromType
-              containsCannotCastLiteral = true
-            } else {
-              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
-              val cmp = ordering.compare(value, valueRoundTrip)
-              if (cmp == 0) {
-                canCastList += Literal(newValue, fromType)
-              } else {
-                // the literal value is rounded up/down after casting to `fromType`
-                containsCannotCastLiteral = true
-              }
-            }
+        val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()

Review Comment:
   yup, it's better to make the code consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r953397903


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -156,29 +157,36 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
           unwrapCast(EqualTo(in.value, lit)) match {

Review Comment:
   now I start to think that creating `EqualTo` and reusing `unwrapCast` may not be a good idea here. Is it possible to pull the code of downcasting a value to a new function, and call that function here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r955837238


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,83 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var cannotCastCounter = 0
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            cannotCastCounter += 1
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              cannotCastCounter += 1
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                cannotCastCounter += 1
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && cannotCastCounter == 0) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (cannotCastCounter == 0) {
+        Option(unwrapExpr)
+      } else {
+        // since can not cast literals are all transformed to the same `falseIfNotNull(fromExp)`,

Review Comment:
   need to update the comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r953283999


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -156,29 +157,36 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
           unwrapCast(EqualTo(in.value, lit)) match {
-            case EqualTo(_, unwrapLit: Literal) => canCastList += unwrapLit
-            case e @ And(IsNull(_), Literal(null, BooleanType)) => cannotCastList += e
+            // the function `unwrapCast` returns None means the literal can not cast to fromType,
+            // for instance: (the boundreference is of type DECIMAL(5,2))
+            //     CAST(boundreference() AS DECIMAL(10,4)) = 123456.1234BD
+            // Due to `cast(lit, fromExp.dataType) == null` we simply return
+            // `falseIfNotNull(fromExp)`.
+            case None | Some(And(IsNull(_), Literal(null, BooleanType))) =>

Review Comment:
   As the comment shown in L149 ~ L153
   
   > There are 3 kinds of literals in the in.list:
   > 1. null literals
   > 2. The literals that can cast to fromExp.dataType
   > 3. The literals that cannot cast to fromExp.dataType
   
   for 3, we have tow cases.
   
   - A. the literal cannot cast to fromExp.dataType, and there is no min/max for the `fromType`, then `unwrapCast` returns None
   
   for instance:
   
   ```
   cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234
   ```
   
   
   - B. the literal `value` is out of `fromType` range `(min, max)`, then `unwrapCast` returns `falseIfNotNull(fromExp)`
   
   for instance:
   
   ```
   cast(input[0, smallint, true] as bigint) = 2147483647
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r959099576


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -316,55 +316,25 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     //     note that 3.14 will be rounded to 3.14000010... after casting to float
 
     val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
-    var containsCannotCastLiteral = false
     val fromType = fromExp.dataType
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minMaxInToType = getRange(fromType).map {
-      case (min, max) =>
-        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
 
     list.foreach {
       case lit @ Literal(null, _) => nullList += lit
       case NonNullLiteral(value, _) =>
-        val minMaxCmp = minMaxInToType.map {
-          case (minInToType, maxInToType) =>
-            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
-        }
-        minMaxCmp match {
-          // the literal value is out of fromType range
-          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
-            containsCannotCastLiteral = true
-          case _ =>
-            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
-            if (newValue == null) {
-              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
-              // fromType
-              containsCannotCastLiteral = true
-            } else {
-              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
-              val cmp = ordering.compare(value, valueRoundTrip)
-              if (cmp == 0) {
-                canCastList += Literal(newValue, fromType)
-              } else {
-                // the literal value is rounded up/down after casting to `fromType`
-                containsCannotCastLiteral = true
-              }
-            }
+        val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()

Review Comment:
   Yes, I think it's enough.
   
   1. we rely on equality of cast values when there is no min/max for the `fromType` (e.g. decimal type), or that the literal `value` is within range `(min, max)`
   
   https://github.com/apache/spark/blob/c2d4c4ed4c5d08bb1ee8e91207aef5eccfc50470/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala#L305-L346
   
   2. if the literal value is out of `fromExp.dataType` range, the `newValue` cannot converted back to `toType` without precision loss.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r959137157


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -316,55 +316,25 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     //     note that 3.14 will be rounded to 3.14000010... after casting to float
 
     val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
-    var containsCannotCastLiteral = false
     val fromType = fromExp.dataType
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minMaxInToType = getRange(fromType).map {
-      case (min, max) =>
-        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
 
     list.foreach {
       case lit @ Literal(null, _) => nullList += lit
       case NonNullLiteral(value, _) =>
-        val minMaxCmp = minMaxInToType.map {
-          case (minInToType, maxInToType) =>
-            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
-        }
-        minMaxCmp match {
-          // the literal value is out of fromType range
-          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
-            containsCannotCastLiteral = true
-          case _ =>
-            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
-            if (newValue == null) {
-              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
-              // fromType
-              containsCannotCastLiteral = true
-            } else {
-              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
-              val cmp = ordering.compare(value, valueRoundTrip)
-              if (cmp == 0) {
-                canCastList += Literal(newValue, fromType)
-              } else {
-                // the literal value is rounded up/down after casting to `fromType`
-                containsCannotCastLiteral = true
-              }
-            }
+        val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()

Review Comment:
   
   `BinaryComparison` is quite different. let's consider the `LessThan` operation.
   
   ```
   LessThan(Cast(fromExp, toType), Literal(value, toType))
   ```
   
   let's say the literal value is out of `fromType` range
   
   1). value > max(fromType) => trueIfNotNull(fromExp)
   
   ```
   cast(input[0, smallint, true] as int) < 32768   =>   isnotnull(input[0, smallint, true]) OR null
   ```
   
   2). value < min(fromType) => falseIfNotNull(fromExp)
   
   ```
   cast(input[0, smallint, true] as int) < -32769  =>   isnull(input[0, smallint, true]) AND null
   ```
   
   we can only remove the range-check related code for `EqualTo` operation.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r951122003


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,7 +155,16 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {

Review Comment:
   shall we make `unwrapCast` return Option so that the code can be clearer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952642489


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,8 +155,19 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {
-            case EqualTo(_, unwrapLit: Literal) => canCastList += unwrapLit
+          val originalEqualTo = EqualTo(in.value, lit)
+          unwrapCast(originalEqualTo) match {
+            case equalTo @ EqualTo(_, unwrapLit: Literal) =>
+              // the function `unwrapCast` may returns original expression when the literal can not
+              // cast to fromType, for instance: (the boundreference is of type DECIMAL(5,2))

Review Comment:
   Can we make `unwrapCast` return None if the literal can not be casted to fromType?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r956920490


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,82 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var containsCannotCastLiteral = false
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            containsCannotCastLiteral = true
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              containsCannotCastLiteral = true
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                containsCannotCastLiteral = true
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && !containsCannotCastLiteral) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (!containsCannotCastLiteral) {
+        Option(unwrapExpr)
+      } else {
+        // the list contains a literal that cannot be cast to `fromExp.dataType`
+        Option(Or(falseIfNotNull(fromExp), unwrapExpr))

Review Comment:
   After reading it again, I'm wondering if this is really the optimal version. If `fromExpr` is not null, then the final expr is `Or(false, unwrapExpr)`, which is the same as `unwrapExpr`. If `fromExpr` is null, then the final expr is `Or(null, unwrapExpr)`, which is again the same as `unwrapExpr` because In/InSet returns null if the input value is null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #37439:
URL: https://github.com/apache/spark/pull/37439#issuecomment-1228167059

   looks much clearer now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r955632072


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -156,29 +157,36 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
           unwrapCast(EqualTo(in.value, lit)) match {
-            case EqualTo(_, unwrapLit: Literal) => canCastList += unwrapLit
-            case e @ And(IsNull(_), Literal(null, BooleanType)) => cannotCastList += e
+            // the function `unwrapCast` returns None means the literal can not cast to fromType,
+            // for instance: (the boundreference is of type DECIMAL(5,2))
+            //     CAST(boundreference() AS DECIMAL(10,4)) = 123456.1234BD
+            // Due to `cast(lit, fromExp.dataType) == null` we simply return
+            // `falseIfNotNull(fromExp)`.
+            case None | Some(And(IsNull(_), Literal(null, BooleanType))) =>

Review Comment:
   Actually, I missed another case before.
   - C. the literal value is rounded up/down after casting to `fromType`, for instance:
   ```
   cast(input[1, float, true] as double) = 3.14
   ```
    note that 3.14 will be rounded to 3.14000010... after casting to float
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r959116606


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -316,55 +316,25 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     //     note that 3.14 will be rounded to 3.14000010... after casting to float
 
     val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
-    var containsCannotCastLiteral = false
     val fromType = fromExp.dataType
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minMaxInToType = getRange(fromType).map {
-      case (min, max) =>
-        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
 
     list.foreach {
       case lit @ Literal(null, _) => nullList += lit
       case NonNullLiteral(value, _) =>
-        val minMaxCmp = minMaxInToType.map {
-          case (minInToType, maxInToType) =>
-            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
-        }
-        minMaxCmp match {
-          // the literal value is out of fromType range
-          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
-            containsCannotCastLiteral = true
-          case _ =>
-            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
-            if (newValue == null) {
-              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
-              // fromType
-              containsCannotCastLiteral = true
-            } else {
-              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
-              val cmp = ordering.compare(value, valueRoundTrip)
-              if (cmp == 0) {
-                canCastList += Literal(newValue, fromType)
-              } else {
-                // the literal value is rounded up/down after casting to `fromType`
-                containsCannotCastLiteral = true
-              }
-            }
+        val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()

Review Comment:
   then shall we remove the range-check related code completely?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952326271


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,7 +155,16 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {

Review Comment:
   I mean something like
   ```
   val newList = list.map(lit => unwrapCast(EqualTo(in.value, lit)))
   if (newList.forall(_.isDefined)) {
     ...
   } else {
     exp
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
sunchao commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r950718579


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,7 +155,16 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {
+          val originalEqualTo = EqualTo(in.value, lit)
+          unwrapCast(originalEqualTo) match {
+            // the function `unwrapCast` may returns original expression when the literal can not
+            // cast to fromType, for instance: (the boundreference is of type DECIMAL(5,2))
+            //     CAST(boundreference() AS DECIMAL(10,4)) = 123456.1234BD
+            // Due to `cast(lit, fromExp.dataType) == null` we simply return
+            // `falseIfNotNull(fromExp)`.
+            case equalTo: EqualTo if equalTo == originalEqualTo &&
+              Cast(lit, fromExp.dataType).eval() == null =>

Review Comment:
   will there be any cases where `Cast(lit, fromExp.dataType).eval()` is not equal to `null`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r955790596


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,85 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   A. the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //   for instance:
+    //       `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   B. the literal value is out of fromType range, for instance:
+    //       `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   C. the literal value is rounded up/down after casting to `fromType`, for instance:
+    //       `(cast(input[1, float, true] as double) = 3.14)`
+    //   note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList, cannotCastList) =
+      (ArrayBuffer[Literal](), ArrayBuffer[Literal](), ArrayBuffer[Expression]())
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            cannotCastList += falseIfNotNull(fromExp)

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952326271


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,7 +155,16 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {

Review Comment:
   I mean something like
   ```
   unwrapCast(originalEqualTo) match {
     case Some(...) ...
     case _ => cannotCastList += ...
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #37439:
URL: https://github.com/apache/spark/pull/37439#issuecomment-1232481968

   thanks, merging to master/3.3!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r959122258


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -316,55 +316,25 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     //     note that 3.14 will be rounded to 3.14000010... after casting to float
 
     val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
-    var containsCannotCastLiteral = false
     val fromType = fromExp.dataType
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minMaxInToType = getRange(fromType).map {
-      case (min, max) =>
-        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
 
     list.foreach {
       case lit @ Literal(null, _) => nullList += lit
       case NonNullLiteral(value, _) =>
-        val minMaxCmp = minMaxInToType.map {
-          case (minInToType, maxInToType) =>
-            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
-        }
-        minMaxCmp match {
-          // the literal value is out of fromType range
-          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
-            containsCannotCastLiteral = true
-          case _ =>
-            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
-            if (newValue == null) {
-              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
-              // fromType
-              containsCannotCastLiteral = true
-            } else {
-              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
-              val cmp = ordering.compare(value, valueRoundTrip)
-              if (cmp == 0) {
-                canCastList += Literal(newValue, fromType)
-              } else {
-                // the literal value is rounded up/down after casting to `fromType`
-                containsCannotCastLiteral = true
-              }
-            }
+        val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()

Review Comment:
   do you mean to remove the range-check related code of unwrapping BinaryComparison?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r955633485


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -156,29 +157,36 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
           unwrapCast(EqualTo(in.value, lit)) match {

Review Comment:
   added a new function `simplifyIn`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r955837469


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,83 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var cannotCastCounter = 0
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            cannotCastCounter += 1
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              cannotCastCounter += 1
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                cannotCastCounter += 1
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && cannotCastCounter == 0) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (cannotCastCounter == 0) {

Review Comment:
   seems it can be a bool instead of a counter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952214651


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,7 +155,16 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {

Review Comment:
   moved to branch `case equalTo @ EqualTo(_, unwrapLit: Literal)`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952211694


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -155,7 +155,16 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
       list.foreach {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
-          unwrapCast(EqualTo(in.value, lit)) match {
+          val originalEqualTo = EqualTo(in.value, lit)
+          unwrapCast(originalEqualTo) match {
+            // the function `unwrapCast` may returns original expression when the literal can not
+            // cast to fromType, for instance: (the boundreference is of type DECIMAL(5,2))
+            //     CAST(boundreference() AS DECIMAL(10,4)) = 123456.1234BD
+            // Due to `cast(lit, fromExp.dataType) == null` we simply return
+            // `falseIfNotNull(fromExp)`.
+            case equalTo: EqualTo if equalTo == originalEqualTo &&
+              Cast(lit, fromExp.dataType).eval() == null =>

Review Comment:
   if `Cast(lit, fromExp.dataType).eval()` is not equal to `null`, then the return expression of `unwrapCast` should be either `EqualTo(fromExp, lit)` or `falseIfNotNull(fromExp)` which was handled by L168-L169.
   
   https://github.com/apache/spark/blob/5a71a7f7b5c1762677ddbfe39a7c35d645c25e94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala#L305-L346
   
   > we may have a return value from L322, L331 or L340.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r953283999


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -156,29 +157,36 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
           unwrapCast(EqualTo(in.value, lit)) match {
-            case EqualTo(_, unwrapLit: Literal) => canCastList += unwrapLit
-            case e @ And(IsNull(_), Literal(null, BooleanType)) => cannotCastList += e
+            // the function `unwrapCast` returns None means the literal can not cast to fromType,
+            // for instance: (the boundreference is of type DECIMAL(5,2))
+            //     CAST(boundreference() AS DECIMAL(10,4)) = 123456.1234BD
+            // Due to `cast(lit, fromExp.dataType) == null` we simply return
+            // `falseIfNotNull(fromExp)`.
+            case None | Some(And(IsNull(_), Literal(null, BooleanType))) =>

Review Comment:
   As the comment shown in L149 ~ L153
   
   > There are 3 kinds of literals in the in.list:
   > 1. null literals
   > 2. The literals that can cast to fromExp.dataType
   > 3. The literals that cannot cast to fromExp.dataType
   
   for 3, we have two cases.
   
   - A. the literal cannot cast to fromExp.dataType, and there is no min/max for the `fromType`, then `unwrapCast` returns None
   
   for instance:
   
   ```
   cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234
   ```
   
   
   - B. the literal `value` is out of `fromType` range `(min, max)`, then `unwrapCast` returns `falseIfNotNull(fromExp)`
   
   for instance:
   
   ```
   cast(input[0, smallint, true] as bigint) = 2147483647
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r952932976


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -156,29 +157,36 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
         case lit @ Literal(null, _) => nullList += lit
         case lit @ NonNullLiteral(_, _) =>
           unwrapCast(EqualTo(in.value, lit)) match {
-            case EqualTo(_, unwrapLit: Literal) => canCastList += unwrapLit
-            case e @ And(IsNull(_), Literal(null, BooleanType)) => cannotCastList += e
+            // the function `unwrapCast` returns None means the literal can not cast to fromType,
+            // for instance: (the boundreference is of type DECIMAL(5,2))
+            //     CAST(boundreference() AS DECIMAL(10,4)) = 123456.1234BD
+            // Due to `cast(lit, fromExp.dataType) == null` we simply return
+            // `falseIfNotNull(fromExp)`.
+            case None | Some(And(IsNull(_), Literal(null, BooleanType))) =>

Review Comment:
   what's wrong with `And(IsNull(_), Literal(null, BooleanType))`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r959159660


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -316,55 +316,25 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     //     note that 3.14 will be rounded to 3.14000010... after casting to float
 
     val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
-    var containsCannotCastLiteral = false
     val fromType = fromExp.dataType
     val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
-    val minMaxInToType = getRange(fromType).map {
-      case (min, max) =>
-        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
-    }
 
     list.foreach {
       case lit @ Literal(null, _) => nullList += lit
       case NonNullLiteral(value, _) =>
-        val minMaxCmp = minMaxInToType.map {
-          case (minInToType, maxInToType) =>
-            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
-        }
-        minMaxCmp match {
-          // the literal value is out of fromType range
-          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
-            containsCannotCastLiteral = true
-          case _ =>
-            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
-            if (newValue == null) {
-              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
-              // fromType
-              containsCannotCastLiteral = true
-            } else {
-              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
-              val cmp = ordering.compare(value, valueRoundTrip)
-              if (cmp == 0) {
-                canCastList += Literal(newValue, fromType)
-              } else {
-                // the literal value is rounded up/down after casting to `fromType`
-                containsCannotCastLiteral = true
-              }
-            }
+        val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()

Review Comment:
   ah I see, thanks for the explanation!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r958049352


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,82 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var containsCannotCastLiteral = false
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            containsCannotCastLiteral = true
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              containsCannotCastLiteral = true
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                containsCannotCastLiteral = true
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && !containsCannotCastLiteral) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (!containsCannotCastLiteral) {
+        Option(unwrapExpr)
+      } else {
+        // the list contains a literal that cannot be cast to `fromExp.dataType`
+        Option(Or(falseIfNotNull(fromExp), unwrapExpr))

Review Comment:
   Yea, why not return a simpler expression if the result is the same?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r957967041


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,82 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var containsCannotCastLiteral = false
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            containsCannotCastLiteral = true
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              containsCannotCastLiteral = true
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                containsCannotCastLiteral = true
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && !containsCannotCastLiteral) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (!containsCannotCastLiteral) {
+        Option(unwrapExpr)
+      } else {
+        // the list contains a literal that cannot be cast to `fromExp.dataType`
+        Option(Or(falseIfNotNull(fromExp), unwrapExpr))

Review Comment:
   1. this rule is similar to optimizing `EqualTo(Cast(fromExp, toType), Literal(value, toType))`, when the literal is out of fromType range, the EqualTo will be optimized to `falseIfNotNull(fromType)`
   
   ```
   cast(input[0, smallint, true] as int) = 32768  => isnull(input[0, smallint, true]) AND null
   ```
   
   2. the `And` expression will be optimized by `ReplaceNullWithFalseInPredicate/BooleanSimplification` then pushed down into data sources in the next optimizer
   
   https://github.com/apache/spark/blob/b8f694f643186e91c31ce4420c8567c3e3ecad4e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala#L41-L43
   
   
   ```
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison ===
    GlobalLimit 21                                                         GlobalLimit 21
    +- LocalLimit 21                                                       +- LocalLimit 21
       +- Project [cast(a#13 as string) AS a#17]                              +- Project [cast(a#13 as string) AS a#17]
   !      +- Filter cast(a#13 as decimal(12,2)) IN (100000.00,1.00,2.00)         +- Filter ((isnull(a#13) AND null) OR a#13 IN (1,2))
             +- Relation spark_catalog.default.t1[a#13] parquet                     +- Relation spark_catalog.default.t1[a#13] parquet
              
   11:13:15.807 WARN org.apache.spark.sql.catalyst.rules.PlanChangeLogger: 
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.ReplaceNullWithFalseInPredicate ===
    GlobalLimit 21                                                GlobalLimit 21
    +- LocalLimit 21                                              +- LocalLimit 21
       +- Project [cast(a#13 as string) AS a#17]                     +- Project [cast(a#13 as string) AS a#17]
   !      +- Filter ((isnull(a#13) AND null) OR a#13 IN (1,2))          +- Filter ((isnull(a#13) AND false) OR a#13 IN (1,2))
             +- Relation spark_catalog.default.t1[a#13] parquet            +- Relation spark_catalog.default.t1[a#13] parquet
   
   11:13:15.820 WARN org.apache.spark.sql.catalyst.rules.PlanChangeLogger: 
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification ===
    GlobalLimit 21                                                GlobalLimit 21
    +- LocalLimit 21                                              +- LocalLimit 21
       +- Project [cast(a#13 as string) AS a#17]                     +- Project [cast(a#13 as string) AS a#17]
   !      +- Filter ((isnull(a#13) AND false) OR a#13 IN (1,2))         +- Filter a#13 IN (1,2)
             +- Relation spark_catalog.default.t1[a#13] parquet            +- Relation spark_catalog.default.t1[a#13] parquet
   ```
   
   > a is of `decimal(3, 0)` type 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cfmcgrady commented on a diff in pull request #37439: [SPARK-39896][SQL] UnwrapCastInBinaryComparison should work when the literal of In/InSet downcast failed

Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on code in PR #37439:
URL: https://github.com/apache/spark/pull/37439#discussion_r958060241


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##########
@@ -346,6 +293,82 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] {
     }
   }
 
+  private def simplifyIn[IN <: Expression](
+      fromExp: Expression,
+      toType: NumericType,
+      list: Seq[Expression],
+      buildExpr: (ArrayBuffer[Literal], ArrayBuffer[Literal]) => IN): Option[Expression] = {
+
+    // There are 3 kinds of literals in the list:
+    // 1. null literals
+    // 2. The literals that can cast to fromExp.dataType
+    // 3. The literals that cannot cast to fromExp.dataType
+    // Note that:
+    // - null literals is special as we can cast null literals to any data type
+    // - for 3, we have three cases
+    //   1). the literal cannot cast to fromExp.dataType, and there is no min/max for the fromType,
+    //     for instance:
+    //         `cast(input[2, decimal(5,2), true] as decimal(10,4)) = 123456.1234`
+    //   2). the literal value is out of fromType range, for instance:
+    //         `cast(input[0, smallint, true] as bigint) = 2147483647`
+    //   3). the literal value is rounded up/down after casting to `fromType`, for instance:
+    //         `cast(input[1, float, true] as double) = 3.14`
+    //     note that 3.14 will be rounded to 3.14000010... after casting to float
+
+    val (nullList, canCastList) = (ArrayBuffer[Literal](), ArrayBuffer[Literal]())
+    var containsCannotCastLiteral = false
+    val fromType = fromExp.dataType
+    val ordering = toType.ordering.asInstanceOf[Ordering[Any]]
+    val minMaxInToType = getRange(fromType).map {
+      case (min, max) =>
+        (Cast(Literal(min), toType).eval(), Cast(Literal(max), toType).eval())
+    }
+
+    list.foreach {
+      case lit @ Literal(null, _) => nullList += lit
+      case NonNullLiteral(value, _) =>
+        val minMaxCmp = minMaxInToType.map {
+          case (minInToType, maxInToType) =>
+            (ordering.compare(value, minInToType), ordering.compare(value, maxInToType))
+        }
+        minMaxCmp match {
+          // the literal value is out of fromType range
+          case Some((minCmp, maxCmp)) if maxCmp > 0 || minCmp < 0 =>
+            containsCannotCastLiteral = true
+          case _ =>
+            val newValue = Cast(Literal(value), fromType, ansiEnabled = false).eval()
+            if (newValue == null) {
+              // the literal cannot cast to fromExp.dataType, and there is no min/max for the
+              // fromType
+              containsCannotCastLiteral = true
+            } else {
+              val valueRoundTrip = Cast(Literal(newValue, fromType), toType).eval()
+              val cmp = ordering.compare(value, valueRoundTrip)
+              if (cmp == 0) {
+                canCastList += Literal(newValue, fromType)
+              } else {
+                // the literal value is rounded up/down after casting to `fromType`
+                containsCannotCastLiteral = true
+              }
+            }
+        }
+    }
+
+    // return None when list contains only null values.
+    if (canCastList.isEmpty && !containsCannotCastLiteral) {
+      None
+    } else {
+      val unwrapExpr = buildExpr(nullList, canCastList)
+      if (!containsCannotCastLiteral) {
+        Option(unwrapExpr)
+      } else {
+        // the list contains a literal that cannot be cast to `fromExp.dataType`
+        Option(Or(falseIfNotNull(fromExp), unwrapExpr))

Review Comment:
   return `unwrapExpr` SGTM.
   
   ---
   
   > when the literal is out of fromType range, the EqualTo will be optimized to `falseIfNotNull(fromType)`
   >
   > ```
   > cast(input[0, smallint, true] as int) = 32768  => isnull(input[0, smallint, true]) AND null
   > ```
   
   ~~btw, just for learning, why not return false here? cc @sunchao~~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org