You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/31 19:44:57 UTC

[GitHub] [arrow-datafusion] tustvold opened a new pull request #1719: Convert boolean case expressions to boolean logic

tustvold opened a new pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719


   # Which issue does this PR close?
   
   Part of #1693.
   
    # Rationale for this change
   
   Converting boolean case expressions to boolean logic allows all the existing optimisations, pruning logic, etc... to work with them.
   
   # What changes are included in this PR?
   
   This PR rewrites boolean case expressions into boolean logic
   
   # Are there any user-facing changes?
   
   Boolean case expressions will be rewritten
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796051399



##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -442,6 +442,8 @@ impl<'a> ExprRewriter for Simplifier<'a> {
         use Expr::*;
         use Operator::{And, Divide, Eq, Multiply, NotEq, Or};
 
+        let is_boolean_expr = self.is_boolean_type(&expr);

Review comment:
       I recommend putting this into the case statement to avoid evaluating it on each `Expr` (otherwise I think it ends up being `O(N^2)` as `is_boolean_type` recursively checks all inputs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796477249



##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -1291,6 +1337,11 @@ mod tests {
 
     #[test]
     fn simplify_expr_case_when_then_else() {
+        // CASE WHERE c2 != false THEN "ok" == "not_ok" ELSE c2 == true

Review comment:
       Sorry -- I was musing -- no specific changes suggested




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on a change in pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796010059



##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -1291,6 +1337,11 @@ mod tests {
 
     #[test]
     fn simplify_expr_case_when_then_else() {
+        // CASE WHERE c2 != false THEN "ok" == "not_ok" ELSE c2 == true

Review comment:
       Unfortunately the optimizer is not especially brilliant at optimizing boolean expressions - I've created #1716 to track this.
   
   I'm not sure if we want to constrain the rewrite until the boolean expression optimizer is able to more effectively reduce down what it produces...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796985146



##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -1291,6 +1337,11 @@ mod tests {
 
     #[test]
     fn simplify_expr_case_when_then_else() {
+        // CASE WHERE c2 != false THEN "ok" == "not_ok" ELSE c2 == true

Review comment:
       I implemented something related here: https://github.com/influxdata/influxdb_iox/pull/3557/files#r796984073 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on a change in pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
tustvold commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796447830



##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -1291,6 +1337,11 @@ mod tests {
 
     #[test]
     fn simplify_expr_case_when_then_else() {
+        // CASE WHERE c2 != false THEN "ok" == "not_ok" ELSE c2 == true

Review comment:
       I'm not sure I follow, are you suggesting I change something or just musing :sweat_smile: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796067705



##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -662,6 +664,48 @@ impl<'a> ExprRewriter for Simplifier<'a> {
                 _ => unreachable!(),
             },
 
+            //
+            // Rules for Case
+            //
+
+            // CASE
+            //   WHEN X THEN A
+            //   WHEN Y THEN B
+            //   ...
+            //   ELSE Q
+            // END
+            //
+            // ---> (X AND A) OR (Y AND B AND NOT X) OR ... (NOT (X OR Y) AND Q)
+            Case {
+                expr: None,
+                when_then_expr,
+                else_expr,
+            } if is_boolean_expr => {
+                // The disjunction of all the when predicates encountered so far

Review comment:
       Given the complexity of expression this could potentially produce `O(n!)` maybe in terms of the number of `when` exprs (I think?), what do you think about adding a heuristic that only does this rewrite when there are a 'small' number of cases -- perhaps 2 or 3?
   
   This would prevent some sort of pathological explosion and would handle the usecase of simple mapping `CASE` statements




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796066392



##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -1291,6 +1337,11 @@ mod tests {
 
     #[test]
     fn simplify_expr_case_when_then_else() {
+        // CASE WHERE c2 != false THEN "ok" == "not_ok" ELSE c2 == true

Review comment:
       🤔  I do wonder given the nullability concerns, if in addition to "simplify" which seeks to retain the same semantics, there could be something like:
   
   ```rust
   /// returns true if this expr *always* returns null or false
   /// (aka would filter out all rows from a query)
   fn is_null_or_false(expr: &Expr) -> bool {
     ...
   }
   ```

##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -662,6 +664,48 @@ impl<'a> ExprRewriter for Simplifier<'a> {
                 _ => unreachable!(),
             },
 
+            //
+            // Rules for Case
+            //
+
+            // CASE
+            //   WHEN X THEN A
+            //   WHEN Y THEN B
+            //   ...
+            //   ELSE Q
+            // END
+            //
+            // ---> (X AND A) OR (Y AND B AND NOT X) OR ... (NOT (X OR Y) AND Q)

Review comment:
       ```suggestion
               // ---> (X AND A) OR (Y AND B AND NOT X) OR ... (NOT (X OR Y) AND Q)
               // 
               // Note: the rationale for this rewrite is that the expr can then be further simplified 
               // using the existing rules for AND/OR
   ```

##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -662,6 +664,48 @@ impl<'a> ExprRewriter for Simplifier<'a> {
                 _ => unreachable!(),
             },
 
+            //
+            // Rules for Case
+            //
+
+            // CASE
+            //   WHEN X THEN A
+            //   WHEN Y THEN B
+            //   ...
+            //   ELSE Q
+            // END
+            //
+            // ---> (X AND A) OR (Y AND B AND NOT X) OR ... (NOT (X OR Y) AND Q)
+            Case {
+                expr: None,
+                when_then_expr,
+                else_expr,
+            } if is_boolean_expr => {
+                // The disjunction of all the when predicates encountered so far

Review comment:
       Given the complexity of expression this could potentially produce `O(n!)` maybe in terms of the number of `when` exprs, what do you think about adding a heuristic that only does this rewrite when there are a 'small' number of cases -- perhaps 2 or 3?
   
   This would prevent some sort of pathological explosion and would handle the usecase of simple mapping `CASE` statements




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1719: Convert boolean case expressions to boolean logic

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org