You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Jefffrey (via GitHub)" <gi...@apache.org> on 2023/04/01 01:40:25 UTC

[GitHub] [arrow-datafusion] Jefffrey opened a new pull request, #5820: fix: Coerce case expression and when to common type

Jefffrey opened a new pull request, #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #5538
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Given following:
   
   ```sql
   SELECT
     CASE a
       WHEN b THEN 0
       WHEN c THEN 1
       ELSE 2
     END;
   ```
   
   Should attempt to coerce `a`, `b` and `c` to common type.
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Ensure `case` and `when` expressions are coerced to common type, similar to how currently `then` and `else` expressions are coerced to common type.
   
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   Added to sqllogictest
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5820: fix: Enhance case expression type coercion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1155364008


##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -195,8 +195,12 @@ impl CaseExpr {
                 _ => when_value,
             };
             let when_value = when_value.into_array(batch.num_rows());
-            let when_value = as_boolean_array(&when_value)
-                .expect("WHEN expression did not return a BooleanArray");
+            let when_value = as_boolean_array(&when_value).map_err(|e| {
+                DataFusionError::Context(

Review Comment:
   🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5820: fix: Enhance case expression type coercion

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1155883766


##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -195,8 +195,12 @@ impl CaseExpr {
                 _ => when_value,
             };
             let when_value = when_value.into_array(batch.num_rows());
-            let when_value = as_boolean_array(&when_value)
-                .expect("WHEN expression did not return a BooleanArray");
+            let when_value = as_boolean_array(&when_value).map_err(|e| {
+                DataFusionError::Context(

Review Comment:
   Yeah figured better to wrap in Context to hopefully provide more informative error message



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on pull request #5820: fix: Coerce case expression and when to common type

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#issuecomment-1493150374

   I've found another bug related to case expression casting, will raise issue and also fix as part of this PR since will be affecting similar code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5820: fix: Enhance case expression type coercion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1156127545


##########
datafusion/optimizer/src/type_coercion.rs:
##########
@@ -638,19 +606,130 @@ fn coerce_agg_exprs_for_signature(
         .collect::<Result<Vec<_>>>()
 }
 
+fn coerce_case_expression(case: Case, schema: &DFSchemaRef) -> Result<Case> {
+    // Given expressions like:
+    //
+    // CASE a1
+    //   WHEN a2 THEN b1
+    //   WHEN a3 THEN b2
+    //   ELSE b3
+    // END
+    //
+    // or:
+    //
+    // CASE
+    //   WHEN x1 THEN b1
+    //   WHEN x2 THEN b2
+    //   ELSE b3
+    // END
+    //
+    // Then all aN (a1, a2, a3) must be converted to a common data type in the first example
+    // (case-when expression coercion)
+    //
+    // All xN (x1, x2) must be converted to a boolean data type in the second example
+    // (when-boolean expression coercion)
+    //
+    // And all bN (b1, b2, b3) must be converted to a common data type in both examples
+    // (then-else expression coercion)
+    //
+    // If any fail to find and cast to a common/specific data type, will return error
+    //
+    // Note that case-when and when-boolean expression coercions are mutually exclusive
+    // Only one or the other can occur for a case expression, whilst then-else expression coercion will always occur
+
+    // prepare types
+    let case_type = case
+        .expr
+        .as_ref()
+        .map(|expr| expr.get_type(&schema))
+        .transpose()?;
+    let then_types = case
+        .when_then_expr
+        .iter()
+        .map(|(_when, then)| then.get_type(&schema))
+        .collect::<Result<Vec<_>>>()?;
+    let else_type = case
+        .else_expr
+        .as_ref()
+        .map(|expr| expr.get_type(&schema))
+        .transpose()?;
+
+    // find common coercible types

Review Comment:
   I found this code really easy to read. Thank you @Jefffrey 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5820: fix: Enhance case expression type coercion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1155363938


##########
datafusion/core/tests/sqllogictests/test_files/select.slt:
##########
@@ -214,3 +214,9 @@ select * from (select 1 a union all select 2) b order by a limit null;
 query I
 select * from (select 1 a union all select 2) b order by a limit 0;
 ----
+
+# select case when type coercion
+query I
+select CASE 10.5 WHEN 0 THEN 1 ELSE 10 END as col;
+----
+10

Review Comment:
   Yes, you can use the `SET` command to change config values. For example https://github.com/apache/arrow-datafusion/blob/7b90474adf6c1afbb4a5988a5677834b07347321/datafusion/core/tests/sqllogictests/test_files/information_schema.slt#L30



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5820: fix: Coerce case expression and when to common type

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1155095820


##########
datafusion/core/tests/sqllogictests/test_files/select.slt:
##########
@@ -214,3 +214,9 @@ select * from (select 1 a union all select 2) b order by a limit null;
 query I
 select * from (select 1 a union all select 2) b order by a limit 0;
 ----
+
+# select case when type coercion
+query I
+select CASE 10.5 WHEN 0 THEN 1 ELSE 10 END as col;
+----
+10

Review Comment:
   Is there a way in sqllogictests to change the config of DataFusion? Since for negative cases to produce error, would need to set `datafusion.optimizer.skip_failed_rules` to false, unless you mean for the negative case to fail as in the original issue?
   
   Otherwise can add the negative case as a unit test in actual code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5820: fix: Coerce case expression and when to common type

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1155092312


##########
datafusion/optimizer/src/type_coercion.rs:
##########
@@ -330,8 +330,35 @@ impl TreeNodeRewriter for TypeCoercionRewriter {
                 }
             }
             Expr::Case(case) => {
-                // all the result of then and else should be convert to a common data type,
-                // if they can be coercible to a common data type, return error.
+                // Given expressions like:
+                //
+                // CASE a1
+                //   WHEN a2 THEN b1
+                //   WHEN a3 THEN b2
+                //   ELSE b3
+                // END
+                //
+                // or:
+                //
+                // CASE
+                //   WHEN x1 THEN b1
+                //   WHEN x2 THEN b2
+                //   ELSE b3
+                // END
+                //
+                // Then all aN (a1, a2, a3) must be converted to a common data type in the first example

Review Comment:
   ❤️ 



##########
datafusion/core/tests/sqllogictests/test_files/select.slt:
##########
@@ -214,3 +214,9 @@ select * from (select 1 a union all select 2) b order by a limit null;
 query I
 select * from (select 1 a union all select 2) b order by a limit 0;
 ----
+
+# select case when type coercion
+query I
+select CASE 10.5 WHEN 0 THEN 1 ELSE 10 END as col;
+----
+10

Review Comment:
   Could you add some additional tests?
   
   1. Alternate Syntax:
   ```
   select CASE 
     WHEN 10 = 5 THEN 1 
     WHEN 'true' THEN 1 
     ELSE 10 
   END as col;
   ```
   
   
   Also negative cases like
   
   ```sql
   select CASE 10.5 WHEN 'not a number' THEN 1 ELSE 10 END as col;
   ```
   
   ```sql
   select CASE 
     WHEN 10 = 5 THEN 1 
     WHEN 'not boolean' THEN 1 
     ELSE 10 
   END as col;
   ```
   
   ```sql
   select CASE 
     WHEN 10 = 5 THEN 1 
     WHEN 5 THEN 'not a number'
     ELSE 10 
   END as col;
   ```
   
   ```sql
   select CASE 
     WHEN 10 = 5 THEN 1 
     WHEN 5 THEN 0
     ELSE 'goo' 
   END as col;
   ```



##########
datafusion/optimizer/src/type_coercion.rs:
##########
@@ -330,8 +330,35 @@ impl TreeNodeRewriter for TypeCoercionRewriter {
                 }
             }
             Expr::Case(case) => {
-                // all the result of then and else should be convert to a common data type,
-                // if they can be coercible to a common data type, return error.
+                // Given expressions like:
+                //
+                // CASE a1
+                //   WHEN a2 THEN b1
+                //   WHEN a3 THEN b2
+                //   ELSE b3
+                // END
+                //
+                // or:
+                //
+                // CASE
+                //   WHEN x1 THEN b1
+                //   WHEN x2 THEN b2
+                //   ELSE b3
+                // END
+                //
+                // Then all aN (a1, a2, a3) must be converted to a common data type in the first example
+                // (case-when expression coercion)
+                //
+                // And all bN (b1, b2, b3) must be converted to a common data type in both examples
+                // (then-else expression coercion)
+                //
+                // If either fail to find a common data type, will return error
+
+                // prepare types
+                let case_type = match &case.expr {
+                    None => Ok(None),
+                    Some(expr) => expr.get_type(&self.schema).map(Some),
+                }?;

Review Comment:
   You can write this more concisely / functionally using `transpose` like this if you want
   
   ```suggestion
                   // prepare types
                   let case_type = case.expr.as_ref()
                       .map(|expr| expr.get_type(&self.schema))
                       .transpose()?;
   ```
   
   The same basic transformation applies to the other `match` statements below -- I don't think it is a big deal I just figured I would point it out to you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5820: fix: Coerce case expression and when to common type

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1155097167


##########
datafusion/optimizer/src/type_coercion.rs:
##########
@@ -330,8 +330,35 @@ impl TreeNodeRewriter for TypeCoercionRewriter {
                 }
             }
             Expr::Case(case) => {
-                // all the result of then and else should be convert to a common data type,
-                // if they can be coercible to a common data type, return error.
+                // Given expressions like:
+                //
+                // CASE a1
+                //   WHEN a2 THEN b1
+                //   WHEN a3 THEN b2
+                //   ELSE b3
+                // END
+                //
+                // or:
+                //
+                // CASE
+                //   WHEN x1 THEN b1
+                //   WHEN x2 THEN b2
+                //   ELSE b3
+                // END
+                //
+                // Then all aN (a1, a2, a3) must be converted to a common data type in the first example
+                // (case-when expression coercion)
+                //
+                // And all bN (b1, b2, b3) must be converted to a common data type in both examples
+                // (then-else expression coercion)
+                //
+                // If either fail to find a common data type, will return error
+
+                // prepare types
+                let case_type = match &case.expr {
+                    None => Ok(None),
+                    Some(expr) => expr.get_type(&self.schema).map(Some),
+                }?;

Review Comment:
   Thanks that does look cleaner, will apply it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on a diff in pull request #5820: fix: Enhance case expression type coercion

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on code in PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820#discussion_r1155883225


##########
datafusion/core/tests/sqllogictests/test_files/select.slt:
##########
@@ -214,3 +214,9 @@ select * from (select 1 a union all select 2) b order by a limit null;
 query I
 select * from (select 1 a union all select 2) b order by a limit 0;
 ----
+
+# select case when type coercion
+query I
+select CASE 10.5 WHEN 0 THEN 1 ELSE 10 END as col;
+----
+10

Review Comment:
   Testing with sqllogictests turned out to be kinda finicky, as this PR is meant to enhance the type coercion, but not actually testing the actual casting itself (since some of those negative cases would pass the type coercion but fail at execution time due to actual contents of the data). So I figured it'd be easier to add more tests as unit tests for proper testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5820: fix: Enhance case expression type coercion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5820:
URL: https://github.com/apache/arrow-datafusion/pull/5820


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org