You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/24 23:38:27 UTC

[GitHub] [arrow] houqp opened a new pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

houqp opened a new pull request #9309:
URL: https://github.com/apache/arrow/pull/9309


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766489488


   clippy is failing due to unrelated issue, should i fix it in the same PR or create a new dedicated one to address the issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp removed a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp removed a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766489488


   clippy is failing due to unrelated issue, can i fix it in the same PR or should I create a new dedicated one to address the issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-779445880


   @alamb rebased PR on latest master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766461000


   https://issues.apache.org/jira/browse/ARROW-11366


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r575724007



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       OK, I have updated the code to apply the folding optimization to all plan nodes and expressions wherever applicable. We currently doesn't do the `CAST` rewrite at the moment, so that needs to be handled in a separate ticket/PR to get full support for applying binary operators on expressions with different types. This PR now assumes automatic casting will be added in the future.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r576445235



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;
+            }
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| {
+                            Ok((
+                                Box::new(optimize_expr(when, schemas)?),
+                                Box::new(optimize_expr(then, schemas)?),
+                            ))
+                        })
+                        .collect::<Result<_>>()?,
+                    else_expr: match else_expr {
+                        Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                        None => None,
+                    },
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()

Review comment:
       This is based on the doc string we have in `logical_plan/expr.rs`:
   
   ```
   The second form uses a base expression and then a series of "when" clauses that match on a literal value.
   ```
   
   However, general SQL and spark SQL does support arbitrary expressions in when conditions, so I think it's best for us to match that behavior.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766489488


   clippy is failing due to unrelated issue, should i fix it in the same PR or create a new dedicated one to address the issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r564293129



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {

Review comment:
       I think we could make this optimizer a bit more generic (not necessary in this PR) to split recursion / pattern match
   
   This is the more "optimizer framework" I had in mind in the comment on the roadmap
   @alamb @jorgecarleitao @vertexclique .
   
   A common strategy (used by Spark) for rule / replacement based  to have a loop that just does something like this:
   
   ```rust
   let changed = False;
   
   while !changed {
       (logical_plan, changed)  = apply_optimizations(rules, logical_plan);
   }
   ```
   
   A rule could be something like (returning `Some` on replaced output) and doesn't need the boilerplate of recursion for every rule.
   
   ```rust
   // Optimizer can work both on expr and logical plans, default returns `None`
   impl OptimizerRule for BooleanComparison {
      fn optimize_expr(&mut self, plan: &Expr) -> Option<Expr> {
          match e {
                Expr::BinaryExpr { left, op, right } => {
               let left = optimize_expr(left);
               let right = optimize_expr(right);
               match op {
                   Operator::Eq => match (&left, &right) {
                       (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
                           Some(true) => Some(right),
                           Some(false) | None => Some(Expr::Not(Box::new(right))),
                       },
                       (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
                           Some(true) => Some(left),
                           Some(false) | None => Some(Expr::Not(Box::new(left))),
                       },
                       _ => None,
                   },
                   Operator::NotEq => match (&left, &right) {
                       (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
                           Some(false) | None => Some(right),
                           Some(true) => Some(Expr::Not(Box::new(right))),
                       },
                       (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
                           Some(false) | None => left,
                           Some(true) => Some(Expr::Not(Box::new(left))),
                       },
                       _ => None,
                   },
                   _ => None
               }
           }
          }
      }
   }
   
   ```
       




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-780688437


   Thanks again @houqp  -- really nice job


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r577103996



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,671 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        // We need to pass down the all schemas within the plan tree to `optimize_expr` in order to
+        // to evaluate expression types. For example, a projection plan's schema will only include
+        // projected columns. With just the projected schema, it's not possible to infer types for
+        // expressions that references non-projected columns within the same project plan or its
+        // children plans.
+
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        if let Ok(DataType::Boolean) = expr.get_type(schema) {
+            return true;
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the expression tree.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            // recurse into CASE WHEN condition expressions

Review comment:
       👍 

##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,671 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        // We need to pass down the all schemas within the plan tree to `optimize_expr` in order to
+        // to evaluate expression types. For example, a projection plan's schema will only include
+        // projected columns. With just the projected schema, it's not possible to infer types for
+        // expressions that references non-projected columns within the same project plan or its
+        // children plans.
+
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        if let Ok(DataType::Boolean) = expr.get_type(schema) {
+            return true;
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the expression tree.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            // recurse into CASE WHEN condition expressions
+            Expr::Case {
+                expr: match expr {
+                    Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                    None => None,
+                },
+                when_then_expr: when_then_expr
+                    .iter()
+                    .map(|(when, then)| {
+                        Ok((
+                            Box::new(optimize_expr(when, schemas)?),
+                            Box::new(optimize_expr(then, schemas)?),
+                        ))
+                    })
+                    .collect::<Result<_>>()?,
+                else_expr: match else_expr {
+                    Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                    None => None,
+                },
+            }
+        }
+        Expr::Alias(expr, name) => {
+            Expr::Alias(Box::new(optimize_expr(expr, schemas)?), name.clone())
+        }
+        Expr::Negative(expr) => Expr::Negative(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => Expr::InList {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            list: list
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            negated: *negated,
+        },
+        Expr::IsNotNull(expr) => Expr::IsNotNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::IsNull(expr) => Expr::IsNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::Cast { expr, data_type } => Expr::Cast {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            data_type: data_type.clone(),
+        },
+        Expr::Between {
+            expr,
+            negated,
+            low,
+            high,
+        } => Expr::Between {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            negated: *negated,
+            low: Box::new(optimize_expr(low, schemas)?),
+            high: Box::new(optimize_expr(high, schemas)?),
+        },
+        Expr::ScalarFunction { fun, args } => Expr::ScalarFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::ScalarUDF { fun, args } => Expr::ScalarUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::AggregateFunction {
+            fun,
+            args,
+            distinct,
+        } => Expr::AggregateFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            distinct: *distinct,
+        },
+        Expr::AggregateUDF { fun, args } => Expr::AggregateUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::Sort {
+            expr,
+            asc,
+            nulls_first,
+        } => Expr::Sort {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            asc: *asc,
+            nulls_first: *nulls_first,
+        },
+        Expr::Column { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Literal { .. }
+        | Expr::Wildcard => e.clone(),
+    })
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{
+        col, lit, max, min, DFField, DFSchema, LogicalPlanBuilder,
+    };
+
+    use arrow::datatypes::*;
+
+    fn test_table_scan() -> Result<LogicalPlan> {
+        let schema = Schema::new(vec![
+            Field::new("a", DataType::Boolean, false),
+            Field::new("b", DataType::Boolean, false),
+            Field::new("c", DataType::Boolean, false),
+            Field::new("d", DataType::UInt32, false),
+        ]);
+        LogicalPlanBuilder::scan_empty("test", &schema, None)?.build()
+    }
+
+    fn expr_test_schema() -> DFSchemaRef {
+        Arc::new(
+            DFSchema::new(vec![
+                DFField::new(None, "c1", DataType::Utf8, true),
+                DFField::new(None, "c2", DataType::Boolean, true),
+            ])
+            .unwrap(),
+        )
+    }
+
+    #[test]
+    fn optimize_expr_not_not() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(
+            optimize_expr(&col("c2").not().not().not(), &[&schema])?,
+            col("c2").not(),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_null_comparision() -> Result<()> {
+        let schema = expr_test_schema();
+
+        // x = null is always null
+        assert_eq!(

Review comment:
       These tests are so much nicer to read ❤️ 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp edited a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766489488


   clippy is failing due to unrelated issue, can i fix it in the same PR or should I create a new dedicated one to address the issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r576424982



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;
+            }
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| {
+                            Ok((
+                                Box::new(optimize_expr(when, schemas)?),
+                                Box::new(optimize_expr(then, schemas)?),
+                            ))
+                        })
+                        .collect::<Result<_>>()?,
+                    else_expr: match else_expr {
+                        Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                        None => None,
+                    },
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()
+            }
+        }
+        Expr::Alias(expr, name) => {
+            Expr::Alias(Box::new(optimize_expr(expr, schemas)?), name.clone())
+        }
+        Expr::Negative(expr) => Expr::Negative(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => Expr::InList {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            list: list
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            negated: *negated,
+        },
+        Expr::IsNotNull(expr) => Expr::IsNotNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::IsNull(expr) => Expr::IsNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::Cast { expr, data_type } => Expr::Cast {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            data_type: data_type.clone(),
+        },
+        Expr::Between {
+            expr,
+            negated,
+            low,
+            high,
+        } => Expr::Between {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            negated: *negated,
+            low: Box::new(optimize_expr(low, schemas)?),
+            high: Box::new(optimize_expr(high, schemas)?),
+        },
+        Expr::ScalarFunction { fun, args } => Expr::ScalarFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::ScalarUDF { fun, args } => Expr::ScalarUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::AggregateFunction {
+            fun,
+            args,
+            distinct,
+        } => Expr::AggregateFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            distinct: *distinct,
+        },
+        Expr::AggregateUDF { fun, args } => Expr::AggregateUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::Sort {
+            expr,
+            asc,
+            nulls_first,
+        } => Expr::Sort {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            asc: *asc,
+            nulls_first: *nulls_first,
+        },
+        Expr::Column { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Literal { .. }
+        | Expr::Wildcard => e.clone(),
+    })
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{
+        col, lit, max, min, DFField, DFSchema, LogicalPlanBuilder,
+    };
+
+    use arrow::datatypes::*;
+
+    fn test_table_scan() -> Result<LogicalPlan> {
+        let schema = Schema::new(vec![
+            Field::new("a", DataType::Boolean, false),
+            Field::new("b", DataType::Boolean, false),
+            Field::new("c", DataType::Boolean, false),
+            Field::new("d", DataType::UInt32, false),
+        ]);
+        LogicalPlanBuilder::scan_empty("test", &schema, None)?.build()
+    }
+
+    fn expr_test_schema() -> DFSchemaRef {
+        Arc::new(
+            DFSchema::new(vec![
+                DFField::new(None, "c1", DataType::Utf8, true),
+                DFField::new(None, "c2", DataType::Boolean, true),
+            ])
+            .unwrap(),
+        )
+    }
+
+    #[test]
+    fn optimize_expr_not_not() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(
+            optimize_expr(
+                &Expr::Not(Box::new(Expr::Not(Box::new(Expr::Not(Box::new(col(

Review comment:
       for the future, you can write this using `Expr::not` like `col("c2").not().not().not()` 

##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;
+            }
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| {
+                            Ok((
+                                Box::new(optimize_expr(when, schemas)?),
+                                Box::new(optimize_expr(then, schemas)?),
+                            ))
+                        })
+                        .collect::<Result<_>>()?,
+                    else_expr: match else_expr {
+                        Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                        None => None,
+                    },
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()
+            }
+        }
+        Expr::Alias(expr, name) => {
+            Expr::Alias(Box::new(optimize_expr(expr, schemas)?), name.clone())
+        }
+        Expr::Negative(expr) => Expr::Negative(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => Expr::InList {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            list: list
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            negated: *negated,
+        },
+        Expr::IsNotNull(expr) => Expr::IsNotNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::IsNull(expr) => Expr::IsNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::Cast { expr, data_type } => Expr::Cast {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            data_type: data_type.clone(),
+        },
+        Expr::Between {
+            expr,
+            negated,
+            low,
+            high,
+        } => Expr::Between {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            negated: *negated,
+            low: Box::new(optimize_expr(low, schemas)?),
+            high: Box::new(optimize_expr(high, schemas)?),
+        },
+        Expr::ScalarFunction { fun, args } => Expr::ScalarFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::ScalarUDF { fun, args } => Expr::ScalarUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::AggregateFunction {
+            fun,
+            args,
+            distinct,
+        } => Expr::AggregateFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            distinct: *distinct,
+        },
+        Expr::AggregateUDF { fun, args } => Expr::AggregateUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::Sort {
+            expr,
+            asc,
+            nulls_first,
+        } => Expr::Sort {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            asc: *asc,
+            nulls_first: *nulls_first,
+        },
+        Expr::Column { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Literal { .. }
+        | Expr::Wildcard => e.clone(),
+    })
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{
+        col, lit, max, min, DFField, DFSchema, LogicalPlanBuilder,
+    };
+
+    use arrow::datatypes::*;
+
+    fn test_table_scan() -> Result<LogicalPlan> {
+        let schema = Schema::new(vec![
+            Field::new("a", DataType::Boolean, false),
+            Field::new("b", DataType::Boolean, false),
+            Field::new("c", DataType::Boolean, false),
+            Field::new("d", DataType::UInt32, false),
+        ]);
+        LogicalPlanBuilder::scan_empty("test", &schema, None)?.build()
+    }
+
+    fn expr_test_schema() -> DFSchemaRef {
+        Arc::new(
+            DFSchema::new(vec![
+                DFField::new(None, "c1", DataType::Utf8, true),
+                DFField::new(None, "c2", DataType::Boolean, true),
+            ])
+            .unwrap(),
+        )
+    }
+
+    #[test]
+    fn optimize_expr_not_not() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(
+            optimize_expr(
+                &Expr::Not(Box::new(Expr::Not(Box::new(Expr::Not(Box::new(col(
+                    "c2"
+                ))))))),
+                &[&schema],
+            )?,
+            col("c2").not(),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_null_comparision() -> Result<()> {
+        let schema = expr_test_schema();
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::Eq,
+                    right: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                    op: Operator::NotEq,
+                    right: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::NotEq,
+                    right: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                    op: Operator::Eq,
+                    right: Box::new(col("c2")),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_eq() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(col("c2").get_type(&schema)?, DataType::Boolean);
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::Eq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            lit(true),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::Eq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            lit(false),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::Eq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            col("c2"),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::Eq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            col("c2").not(),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_eq_skip_nonboolean_type() -> Result<()> {
+        let schema = expr_test_schema();
+
+        // when one of the operand is not of boolean type, folding the other boolean constant will
+        // change return type of expression to non-boolean.
+        assert_eq!(col("c1").get_type(&schema)?, DataType::Utf8);

Review comment:
       👍 

##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;
+            }
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| {
+                            Ok((
+                                Box::new(optimize_expr(when, schemas)?),
+                                Box::new(optimize_expr(then, schemas)?),
+                            ))
+                        })
+                        .collect::<Result<_>>()?,
+                    else_expr: match else_expr {
+                        Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                        None => None,
+                    },
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()

Review comment:
       In general SQL I can't remember if this is always the case (that the WHEN/THEN clause need to be literals) but I think this handling is fine for now

##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;
+            }
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| {
+                            Ok((
+                                Box::new(optimize_expr(when, schemas)?),
+                                Box::new(optimize_expr(then, schemas)?),
+                            ))
+                        })
+                        .collect::<Result<_>>()?,
+                    else_expr: match else_expr {
+                        Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                        None => None,
+                    },
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()
+            }
+        }
+        Expr::Alias(expr, name) => {
+            Expr::Alias(Box::new(optimize_expr(expr, schemas)?), name.clone())
+        }
+        Expr::Negative(expr) => Expr::Negative(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => Expr::InList {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            list: list
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            negated: *negated,
+        },
+        Expr::IsNotNull(expr) => Expr::IsNotNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::IsNull(expr) => Expr::IsNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::Cast { expr, data_type } => Expr::Cast {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            data_type: data_type.clone(),
+        },
+        Expr::Between {
+            expr,
+            negated,
+            low,
+            high,
+        } => Expr::Between {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            negated: *negated,
+            low: Box::new(optimize_expr(low, schemas)?),
+            high: Box::new(optimize_expr(high, schemas)?),
+        },
+        Expr::ScalarFunction { fun, args } => Expr::ScalarFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::ScalarUDF { fun, args } => Expr::ScalarUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::AggregateFunction {
+            fun,
+            args,
+            distinct,
+        } => Expr::AggregateFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            distinct: *distinct,
+        },
+        Expr::AggregateUDF { fun, args } => Expr::AggregateUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::Sort {
+            expr,
+            asc,
+            nulls_first,
+        } => Expr::Sort {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            asc: *asc,
+            nulls_first: *nulls_first,
+        },
+        Expr::Column { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Literal { .. }
+        | Expr::Wildcard => e.clone(),
+    })
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{
+        col, lit, max, min, DFField, DFSchema, LogicalPlanBuilder,
+    };
+
+    use arrow::datatypes::*;
+
+    fn test_table_scan() -> Result<LogicalPlan> {
+        let schema = Schema::new(vec![
+            Field::new("a", DataType::Boolean, false),
+            Field::new("b", DataType::Boolean, false),
+            Field::new("c", DataType::Boolean, false),
+            Field::new("d", DataType::UInt32, false),
+        ]);
+        LogicalPlanBuilder::scan_empty("test", &schema, None)?.build()
+    }
+
+    fn expr_test_schema() -> DFSchemaRef {
+        Arc::new(
+            DFSchema::new(vec![
+                DFField::new(None, "c1", DataType::Utf8, true),
+                DFField::new(None, "c2", DataType::Boolean, true),
+            ])
+            .unwrap(),
+        )
+    }
+
+    #[test]
+    fn optimize_expr_not_not() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(
+            optimize_expr(
+                &Expr::Not(Box::new(Expr::Not(Box::new(Expr::Not(Box::new(col(
+                    "c2"
+                ))))))),
+                &[&schema],
+            )?,
+            col("c2").not(),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_null_comparision() -> Result<()> {
+        let schema = expr_test_schema();
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::Eq,
+                    right: Box::new(Expr::Literal(ScalarValue::Boolean(None))),

Review comment:
       I am personally a fan of `col(lit(true)).eq(lit(ScalarValue::Boolean(None))))` type expression

##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;
+            }
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| {
+                            Ok((
+                                Box::new(optimize_expr(when, schemas)?),
+                                Box::new(optimize_expr(then, schemas)?),
+                            ))
+                        })
+                        .collect::<Result<_>>()?,
+                    else_expr: match else_expr {
+                        Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                        None => None,
+                    },
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()
+            }
+        }
+        Expr::Alias(expr, name) => {
+            Expr::Alias(Box::new(optimize_expr(expr, schemas)?), name.clone())
+        }
+        Expr::Negative(expr) => Expr::Negative(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => Expr::InList {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            list: list
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            negated: *negated,
+        },
+        Expr::IsNotNull(expr) => Expr::IsNotNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::IsNull(expr) => Expr::IsNull(Box::new(optimize_expr(expr, schemas)?)),
+        Expr::Cast { expr, data_type } => Expr::Cast {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            data_type: data_type.clone(),
+        },
+        Expr::Between {
+            expr,
+            negated,
+            low,
+            high,
+        } => Expr::Between {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            negated: *negated,
+            low: Box::new(optimize_expr(low, schemas)?),
+            high: Box::new(optimize_expr(high, schemas)?),
+        },
+        Expr::ScalarFunction { fun, args } => Expr::ScalarFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::ScalarUDF { fun, args } => Expr::ScalarUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::AggregateFunction {
+            fun,
+            args,
+            distinct,
+        } => Expr::AggregateFunction {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+            distinct: *distinct,
+        },
+        Expr::AggregateUDF { fun, args } => Expr::AggregateUDF {
+            fun: fun.clone(),
+            args: args
+                .iter()
+                .map(|e| optimize_expr(e, schemas))
+                .collect::<Result<_>>()?,
+        },
+        Expr::Sort {
+            expr,
+            asc,
+            nulls_first,
+        } => Expr::Sort {
+            expr: Box::new(optimize_expr(expr, schemas)?),
+            asc: *asc,
+            nulls_first: *nulls_first,
+        },
+        Expr::Column { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Literal { .. }
+        | Expr::Wildcard => e.clone(),
+    })
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{
+        col, lit, max, min, DFField, DFSchema, LogicalPlanBuilder,
+    };
+
+    use arrow::datatypes::*;
+
+    fn test_table_scan() -> Result<LogicalPlan> {
+        let schema = Schema::new(vec![
+            Field::new("a", DataType::Boolean, false),
+            Field::new("b", DataType::Boolean, false),
+            Field::new("c", DataType::Boolean, false),
+            Field::new("d", DataType::UInt32, false),
+        ]);
+        LogicalPlanBuilder::scan_empty("test", &schema, None)?.build()
+    }
+
+    fn expr_test_schema() -> DFSchemaRef {
+        Arc::new(
+            DFSchema::new(vec![
+                DFField::new(None, "c1", DataType::Utf8, true),
+                DFField::new(None, "c2", DataType::Boolean, true),
+            ])
+            .unwrap(),
+        )
+    }
+
+    #[test]
+    fn optimize_expr_not_not() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(
+            optimize_expr(
+                &Expr::Not(Box::new(Expr::Not(Box::new(Expr::Not(Box::new(col(
+                    "c2"
+                ))))))),
+                &[&schema],
+            )?,
+            col("c2").not(),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_null_comparision() -> Result<()> {
+        let schema = expr_test_schema();
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::Eq,
+                    right: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                    op: Operator::NotEq,
+                    right: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::NotEq,
+                    right: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(Expr::Literal(ScalarValue::Boolean(None))),
+                    op: Operator::Eq,
+                    right: Box::new(col("c2")),
+                },
+                &[&schema],
+            )?,
+            Expr::Literal(ScalarValue::Boolean(None)),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_eq() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(col("c2").get_type(&schema)?, DataType::Boolean);
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::Eq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            lit(true),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::Eq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            lit(false),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::Eq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            col("c2"),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::Eq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            col("c2").not(),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_eq_skip_nonboolean_type() -> Result<()> {
+        let schema = expr_test_schema();
+
+        // when one of the operand is not of boolean type, folding the other boolean constant will
+        // change return type of expression to non-boolean.
+        assert_eq!(col("c1").get_type(&schema)?, DataType::Utf8);
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c1")),
+                    op: Operator::Eq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            Expr::BinaryExpr {
+                left: Box::new(col("c1")),
+                op: Operator::Eq,
+                right: Box::new(lit(true)),
+            },
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c1")),
+                    op: Operator::Eq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            Expr::BinaryExpr {
+                left: Box::new(col("c1")),
+                op: Operator::Eq,
+                right: Box::new(lit(false)),
+            },
+        );
+
+        // test constant operands
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(1)),
+                    op: Operator::Eq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            Expr::BinaryExpr {
+                left: Box::new(lit(1)),
+                op: Operator::Eq,
+                right: Box::new(lit(true)),
+            },
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit("a")),
+                    op: Operator::Eq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            Expr::BinaryExpr {
+                left: Box::new(lit("a")),
+                op: Operator::Eq,
+                right: Box::new(lit(false)),
+            },
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_not_eq() -> Result<()> {
+        let schema = expr_test_schema();
+        assert_eq!(col("c2").get_type(&schema)?, DataType::Boolean);
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            col("c2").not(),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c2")),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            col("c2"),
+        );
+
+        // test constant
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            lit(false),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(lit(true)),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            lit(true),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_not_eq_skip_nonboolean_type() -> Result<()> {
+        let schema = expr_test_schema();
+
+        // when one of the operand is not of boolean type, folding the other boolean constant will
+        // change return type of expression to non-boolean.
+        assert_eq!(col("c1").get_type(&schema)?, DataType::Utf8);
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c1")),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(true)),
+                },
+                &[&schema],
+            )?,
+            Expr::BinaryExpr {
+                left: Box::new(col("c1")),
+                op: Operator::NotEq,
+                right: Box::new(lit(true)),
+            },
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::BinaryExpr {
+                    left: Box::new(col("c1")),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(false)),
+                },
+                &[&schema],
+            )?,
+            Expr::BinaryExpr {
+                left: Box::new(col("c1")),
+                op: Operator::NotEq,
+                right: Box::new(lit(false)),
+            },
+        );
+
+        // test constants
+        assert_eq!(
+            optimize_expr(
+                &Expr::Not(Box::new(Expr::BinaryExpr {
+                    left: Box::new(lit(1)),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(true)),
+                })),
+                &[&schema],
+            )?,
+            Expr::Not(Box::new(Expr::BinaryExpr {
+                left: Box::new(lit(1)),
+                op: Operator::NotEq,
+                right: Box::new(lit(true)),
+            })),
+        );
+
+        assert_eq!(
+            optimize_expr(
+                &Expr::Not(Box::new(Expr::BinaryExpr {
+                    left: Box::new(lit("a")),
+                    op: Operator::NotEq,
+                    right: Box::new(lit(false)),
+                })),
+                &[&schema],
+            )?,
+            Expr::Not(Box::new(Expr::BinaryExpr {
+                left: Box::new(lit("a")),
+                op: Operator::NotEq,
+                right: Box::new(lit(false)),
+            })),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn optimize_expr_case_when_then_else() -> Result<()> {
+        let schema = expr_test_schema();
+
+        assert_eq!(
+            optimize_expr(
+                &Box::new(Expr::Case {
+                    expr: None,
+                    when_then_expr: vec![(
+                        Box::new(Expr::BinaryExpr {
+                            left: Box::new(col("c2")),
+                            op: Operator::NotEq,
+                            right: Box::new(lit(false)),
+                        }),
+                        Box::new(lit("ok").eq(lit(true))),
+                    )],
+                    else_expr: Some(Box::new(col("c2").eq(lit(true)))),
+                }),
+                &[&schema],
+            )?,
+            Expr::Case {
+                expr: None,
+                when_then_expr: vec![(
+                    Box::new(col("c2")),
+                    Box::new(lit("ok").eq(lit(true)))
+                )],
+                else_expr: Some(Box::new(col("c2"))),
+            }
+        );
+
+        Ok(())
+    }
+
+    fn assert_optimized_plan_eq(plan: &LogicalPlan, expected: &str) {
+        let rule = ConstantFolding::new();
+        let optimized_plan = rule.optimize(plan).expect("failed to optimize plan");
+        let formatted_plan = format!("{:?}", optimized_plan);
+        assert_eq!(formatted_plan, expected);
+    }
+
+    #[test]
+    fn optimize_plan_eq_expr() -> Result<()> {
+        let table_scan = test_table_scan()?;
+        let plan = LogicalPlanBuilder::from(&table_scan)
+            .filter(col("b").eq(lit(true)))?
+            .filter(col("c").eq(lit(false)))?
+            .project(&[col("a")])?
+            .build()?;
+
+        let expected = "\

Review comment:
       nice




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r575724229



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,620 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, plan.schema())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schema: &DFSchemaRef) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schema)?;
+            let right = optimize_expr(right, schema)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) == r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) | None => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) | None => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) != r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(false) | None => right,
+                            Some(true) => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(false) | None => left,
+                            Some(true) => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr, schema)?)),

Review comment:
       I have added this, I didn't add it in the first place because it will require a convergence loop to converge, but I guess we can work on that as a follow up PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r571687423



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,620 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, plan.schema())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schema: &DFSchemaRef) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schema)?;
+            let right = optimize_expr(right, schema)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) == r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) | None => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) | None => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) != r.unwrap_or(false),

Review comment:
       yeah, any binary operation, other than null safe equality, that involves null should return null, i will updated the other non-literal branches to return null literal as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r564440788



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       I suggest calling this module 'constant_folding.rs` and the optimization `ConstantFolding` -- while this particular PR has code to fold boolean constants, the idea is much more general (e.g you might imagine folding things like `A < (5+3)` into `A < 8` and that sort of thing).
   
   

##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()
+            }
+        }
+        Expr::Alias { .. }
+        | Expr::Negative { .. }
+        | Expr::Column { .. }
+        | Expr::InList { .. }
+        | Expr::IsNotNull { .. }
+        | Expr::IsNull { .. }
+        | Expr::Cast { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Between { .. }
+        | Expr::Literal { .. }
+        | Expr::ScalarFunction { .. }
+        | Expr::ScalarUDF { .. }
+        | Expr::AggregateFunction { .. }
+        | Expr::AggregateUDF { .. }
+        | Expr::Sort { .. }
+        | Expr::Wildcard => e.clone(),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{col, lit, LogicalPlanBuilder};
+    use crate::test::*;
+
+    fn assert_optimized_plan_eq(plan: &LogicalPlan, expected: &str) {
+        let mut rule = BooleanComparison::new();
+        let optimized_plan = rule.optimize(plan).expect("failed to optimize plan");
+        let formatted_plan = format!("{:?}", optimized_plan);
+        assert_eq!(formatted_plan, expected);
+    }
+
+    #[test]
+    fn simplify_eq_expr() -> Result<()> {

Review comment:
       I think adding a test for rewriting expressions in a non-filter plan would be valuable (e.g. make a join plan or something)

##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()
+            }
+        }
+        Expr::Alias { .. }
+        | Expr::Negative { .. }
+        | Expr::Column { .. }
+        | Expr::InList { .. }
+        | Expr::IsNotNull { .. }
+        | Expr::IsNull { .. }
+        | Expr::Cast { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Between { .. }
+        | Expr::Literal { .. }
+        | Expr::ScalarFunction { .. }
+        | Expr::ScalarUDF { .. }
+        | Expr::AggregateFunction { .. }
+        | Expr::AggregateUDF { .. }
+        | Expr::Sort { .. }
+        | Expr::Wildcard => e.clone(),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{col, lit, LogicalPlanBuilder};
+    use crate::test::*;
+
+    fn assert_optimized_plan_eq(plan: &LogicalPlan, expected: &str) {
+        let mut rule = BooleanComparison::new();
+        let optimized_plan = rule.optimize(plan).expect("failed to optimize plan");
+        let formatted_plan = format!("{:?}", optimized_plan);
+        assert_eq!(formatted_plan, expected);
+    }
+
+    #[test]
+    fn simplify_eq_expr() -> Result<()> {
+        let table_scan = test_table_scan()?;
+        let plan = LogicalPlanBuilder::from(&table_scan)
+            .filter(col("a").eq(lit(true)))?
+            .filter(col("b").eq(lit(false)))?
+            .project(vec![col("a")])?
+            .build()?;
+
+        let expected = "\
+        Projection: #a\
+        \n  Filter: NOT #b\
+        \n    Filter: #a\
+        \n      TableScan: test projection=None";
+
+        assert_optimized_plan_eq(&plan, expected);
+        Ok(())
+    }
+
+    #[test]
+    fn simplify_not_eq_expr() -> Result<()> {

Review comment:
       One suggestion on the tests (not needed in this PR) would be to separtate out the cases -- 
   1. one set in terms of `LogicalPlan` that makes sure rewrite was applied to all expressions in the plan node 
   2. Another set for just the logic in `optimize_expr` (aka make an `Expr`, rewrite it, and verify the rewrite was done correctly
   
   

##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {

Review comment:
       As you can probably guess given PRs like https://github.com/apache/arrow/pull/9278 my preference to avoid repeating the structure walking logic is via a Visitor. Perhaps after this PR is merged, I can take a shot at rewriting it using a general `ExprRewriter` type pattern. 

##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {

Review comment:
       @Dandandan  -- I think this PR as written is quite efficient and doesn't need a convergence loop as you suggest (which I think ends up potentially being quite inefficient if many rewrites are required) -- it already does a depth first traversal of the tree, simplifying on the way up. 
   
   I think convergence loops might be best used if we have several rules that can each potentially make changes that would unlock additional optimizations of the others
   
   For example, if you had two different optimization functions like `optimization_A` and `optimization_B` but parts of `optimization_A` wouldn't be applied unless you ran `optimization_B`. In that case a loop like the following would let you take full advantage of that
   
   ```
   while !changed {
     let exprA = optimization_A(expr);
     let exprB = optimization_B(exprA);
     changed = expr != exprB;
     expr = exprB
   }
   ```
   

##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       I think you could also apply `optimize_expr` to then and else_expr:
   
   ```suggestion
                           .map(|(when, then)| (Box::new(optimize_expr(when)), optimize_expr(then))
                           .collect(),
                       else_expr: optimize_expr(else_expr),
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r565033732



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()
+            }
+        }
+        Expr::Alias { .. }
+        | Expr::Negative { .. }
+        | Expr::Column { .. }
+        | Expr::InList { .. }
+        | Expr::IsNotNull { .. }
+        | Expr::IsNull { .. }
+        | Expr::Cast { .. }
+        | Expr::ScalarVariable { .. }
+        | Expr::Between { .. }
+        | Expr::Literal { .. }
+        | Expr::ScalarFunction { .. }
+        | Expr::ScalarUDF { .. }
+        | Expr::AggregateFunction { .. }
+        | Expr::AggregateUDF { .. }
+        | Expr::Sort { .. }
+        | Expr::Wildcard => e.clone(),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::logical_plan::{col, lit, LogicalPlanBuilder};
+    use crate::test::*;
+
+    fn assert_optimized_plan_eq(plan: &LogicalPlan, expected: &str) {
+        let mut rule = BooleanComparison::new();
+        let optimized_plan = rule.optimize(plan).expect("failed to optimize plan");
+        let formatted_plan = format!("{:?}", optimized_plan);
+        assert_eq!(formatted_plan, expected);
+    }
+
+    #[test]
+    fn simplify_eq_expr() -> Result<()> {
+        let table_scan = test_table_scan()?;
+        let plan = LogicalPlanBuilder::from(&table_scan)
+            .filter(col("a").eq(lit(true)))?
+            .filter(col("b").eq(lit(false)))?
+            .project(vec![col("a")])?
+            .build()?;
+
+        let expected = "\
+        Projection: #a\
+        \n  Filter: NOT #b\
+        \n    Filter: #a\
+        \n      TableScan: test projection=None";
+
+        assert_optimized_plan_eq(&plan, expected);
+        Ok(())
+    }
+
+    #[test]
+    fn simplify_not_eq_expr() -> Result<()> {

Review comment:
       good suggestion, i will add it in this PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766471716


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=h1) Report
   > Merging [#9309](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=desc) (325fa4d) into [master](https://codecov.io/gh/apache/arrow/commit/437c8c944acb3479b76804f041f5f8cbce309fa7?el=desc) (437c8c9) will **increase** coverage by `0.00%`.
   > The diff coverage is `81.30%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9309/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff            @@
   ##           master    #9309    +/-   ##
   ========================================
     Coverage   81.88%   81.89%            
   ========================================
     Files         215      216     +1     
     Lines       52988    53108   +120     
   ========================================
   + Hits        43391    43492   +101     
   - Misses       9597     9616    +19     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ust/datafusion/src/optimizer/boolean\_comparison.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvYm9vbGVhbl9jb21wYXJpc29uLnJz) | `78.50% <78.50%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.36% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.39% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `49.41% <0.00%> (+0.39%)` | :arrow_up: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.23% <0.00%> (+0.47%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=footer). Last update [437c8c9...325fa4d](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp edited a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-778694710


   @Dandandan @alamb ready for another round of review. Given this PR has grown into 900 lines, I think it would be better to work on a new `ExprRewriter` refactor only PR to ease the review process. We will be able to get a better idea of how the new framework works by applying it to multiple optimization rules in the same patch set.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r576422331



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;

Review comment:
       You might be able to write this more concisely with something like 
   ```
   if let Ok(DataType::Boolean) = expr.get_type(schema) {
     return true
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766471716


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=h1) Report
   > Merging [#9309](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=desc) (5656232) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.00%`.
   > The diff coverage is `83.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9309/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff            @@
   ##           master    #9309    +/-   ##
   ========================================
     Coverage   81.84%   81.85%            
   ========================================
     Files         215      216     +1     
     Lines       52949    53075   +126     
   ========================================
   + Hits        43336    43444   +108     
   - Misses       9613     9631    +18     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ust/datafusion/src/optimizer/boolean\_comparison.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvYm9vbGVhbl9jb21wYXJpc29uLnJz) | `81.41% <81.41%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.36% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.39% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.23% <0.00%> (+0.47%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=footer). Last update [67d0c2e...5656232](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r571578751



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,620 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, plan.schema())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schema: &DFSchemaRef) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schema)?;
+            let right = optimize_expr(right, schema)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) == r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) | None => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) | None => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) != r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(false) | None => right,
+                            Some(true) => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(false) | None => left,
+                            Some(true) => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr, schema)?)),

Review comment:
       `not(not(x)) -> x` might be a simple rule to add




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r565033597



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       OK, i think what I can do is to check for column type based on plan schema and skip optimization for `col = true` case when col type is not boolean. `col = false` is always safe to be optimized into `!col`. What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r576445235



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;
+            }
+        }
+    }
+
+    false
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schemas)?;
+            let right = optimize_expr(right, schemas)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l == r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) => Expr::Not(Box::new(right)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) => Expr::Not(Box::new(left)),
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => match (l, r) {
+                        (Some(l), Some(r)) => {
+                            Expr::Literal(ScalarValue::Boolean(Some(l != r)))
+                        }
+                        _ => Expr::Literal(ScalarValue::Boolean(None)),
+                    },
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if is_boolean_type(&right, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(right)),
+                            Some(false) => right,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if is_boolean_type(&left, schemas) =>
+                    {
+                        match b {
+                            Some(true) => Expr::Not(Box::new(left)),
+                            Some(false) => left,
+                            None => Expr::Literal(ScalarValue::Boolean(None)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: *op,
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => match &**expr {
+            Expr::Not(inner) => optimize_expr(&inner, schemas)?,
+            _ => Expr::Not(Box::new(optimize_expr(&expr, schemas)?)),
+        },
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| {
+                            Ok((
+                                Box::new(optimize_expr(when, schemas)?),
+                                Box::new(optimize_expr(then, schemas)?),
+                            ))
+                        })
+                        .collect::<Result<_>>()?,
+                    else_expr: match else_expr {
+                        Some(e) => Some(Box::new(optimize_expr(e, schemas)?)),
+                        None => None,
+                    },
+                }
+            } else {
+                // when base expression is specified, when_then_expr conditions are literal values
+                // so we can just skip this case
+                e.clone()

Review comment:
       This is based on the doc string we have in `logical_plan/expr.rs`:
   
   ```
   The second form uses a base expression and then a series of "when" clauses that match on a literal value.
   ``
   
   However, general SQL and spark SQL does support arbitrary expressions in when conditions, so I think it's best for us to match that behavior.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r565030032



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       I have been going back and forth on this one. So basically what I am trying to avoid is
   
   ```CASE
       WHEN true THEN (col1 = true)
       ELSE (col1 = false)
   END; 
   ```
   
   Being optimized into:
   
   ```CASE
       WHEN true THEN col1
       ELSE !col1
   END; 
   ```
   
   This is not a valid optimization when col1 column is not typed as boolean. Although we currently don't support comparison between boolean type and none boolean type in datafusion, i am expecting us to support it in the future. Am I overthinking this?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r571580774



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,620 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, plan.schema())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schema: &DFSchemaRef) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schema)?;
+            let right = optimize_expr(right, schema)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) == r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) | None => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) | None => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) != r.unwrap_or(false),

Review comment:
       It looks like that will result in `False`, I would expect `NULL::BOOL <> NULL::BOOL` to be `NULL`?
   Probably no one will write this, but it could be the result of another optimization pass?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r571580774



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,620 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, plan.schema())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schema: &DFSchemaRef) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schema)?;
+            let right = optimize_expr(right, schema)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) == r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) | None => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) | None => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) != r.unwrap_or(false),

Review comment:
       It looks like that will result in `False`, I would expect `NULL::BOOL OR NULL::BOOL` to be `NULL`?
   Probably no one will write this, but it could be the result of another optimization pass?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766471716


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=h1) Report
   > Merging [#9309](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=desc) (5656232) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.00%`.
   > The diff coverage is `83.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9309/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff            @@
   ##           master    #9309    +/-   ##
   ========================================
     Coverage   81.84%   81.85%            
   ========================================
     Files         215      216     +1     
     Lines       52949    53075   +126     
   ========================================
   + Hits        43336    43444   +108     
   - Misses       9613     9631    +18     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ust/datafusion/src/optimizer/boolean\_comparison.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvYm9vbGVhbl9jb21wYXJpc29uLnJz) | `81.41% <81.41%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.36% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.39% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.23% <0.00%> (+0.47%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=footer). Last update [67d0c2e...5656232](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r564293129



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {

Review comment:
       I think we could make this optimizer a bit more generic (not necessary in this PR) to split recursion / pattern match
   
   This is the more "optimizer framework" I had in mind in the comment on the roadmap
   @alamb @jorgecarleitao @vertexclique .
   
   A common strategy (used by Spark) for rule / replacement based  to have a loop that just does something like this:
   
   ```rust
   let changed = False;
   
   while !changed {
       (logical_plan, changed)  = apply_optimizations(rules, logical_plan);
   }
   ```
   
   A rule could be something like (returning `Some` on replaced output) and doesn't need the boilerplate of recursion for every rule.
   
   ```rust
   // Optimizer can work both on expr and logical plans, default returns `None`
   impl OptimizerRule for BooleanComparison {
      fn optimize_expr(&mut self, plan: &Expr) -> Option<Expr> {
          match e {
                Expr::BinaryExpr { left, op, right } => {
               let left = optimize_expr(left);
               let right = optimize_expr(right);
               match op {
                   Operator::Eq => match (&left, &right) {
                       (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
                           Some(true) => Some(right),
                           Some(false) | None => Some(Expr::Not(Box::new(right))),
                       },
                       (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
                           Some(true) => Some(left),
                           Some(false) | None => Some(Expr::Not(Box::new(left))),
                       },
                       _ => None,
                   },
                   Operator::NotEq => match (&left, &right) {
                       (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
                           Some(false) | None => Some(right),
                           Some(true) => Some(Expr::Not(Box::new(right))),
                       },
                       (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
                           Some(false) | None => Some(left),
                           Some(true) => Some(Expr::Not(Box::new(left))),
                       },
                       _ => None,
                   },
                   _ => None
               }
           }
          }
      }
   }
   
   ```
       




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r565667987



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       That is a good point @houqp  -- I think it applies to all expressions (not just CASE statements). You are in essence worried about changing the type of an expression from boolean to something else which is a good thing to worry about!
   
   For columns that have types other than boolean, I would expect an expression like `col1 = true` to be eventually rewritten to `CAST(col1, boolean) = true` in which case the optimization to `CAST(col1, boolean)` is correct.
   
   Upon reading this code a bit more, I can't recall exactly when cast's (coercions) are inserted. 
   
   I think skipping the boolean rewrite when `col` is not a boolean is the right thing to do  -- and if that type of comparison can happen we need to update the rewrite rules for `BinaryExpr`, `NotExpr`, etc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r576393334



##########
File path: dev/archery/archery/benchmark/compare.py
##########
@@ -25,11 +25,11 @@ def items_per_seconds_fmt(value):
     if value < 1000:
         return "{} items/sec".format(value)
     if value < 1000**2:
-        return "{:.3f}k items/sec".format(value / 1000)
+        return "{:.3f}K items/sec".format(value / 1000)

Review comment:
       was this an intended change in this PR? I don't know anything about the archery benchmark driver, so I think someone who does should probably review these changes if they are needed in this PR

##########
File path: rust/arrow/src/array/array.rs
##########
@@ -326,6 +327,170 @@ pub fn new_empty_array(data_type: &DataType) -> ArrayRef {
     let data = ArrayData::new_empty(data_type);
     make_array(Arc::new(data))
 }
+/// Creates a new array of `data_type` of length `length` filled entirely of `NULL` values
+pub fn new_null_array(data_type: &DataType, length: usize) -> ArrayRef {

Review comment:
       I am not sure why this code is in this PR (maybe the PR needs a rebase)? It was merged into master as part of https://github.com/apache/arrow/pull/9469 I think 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766471716


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=h1) Report
   > Merging [#9309](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=desc) (474e14a) into [master](https://codecov.io/gh/apache/arrow/commit/10f4ada2bbad0770e422e5d70071e991ab3f5f57?el=desc) (10f4ada) will **increase** coverage by `0.00%`.
   > The diff coverage is `81.30%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9309/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff            @@
   ##           master    #9309    +/-   ##
   ========================================
     Coverage   81.84%   81.85%            
   ========================================
     Files         215      216     +1     
     Lines       52949    53069   +120     
   ========================================
   + Hits        43335    43437   +102     
   - Misses       9614     9632    +18     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ust/datafusion/src/optimizer/boolean\_comparison.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvYm9vbGVhbl9jb21wYXJpc29uLnJz) | `78.50% <78.50%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.36% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.39% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `49.41% <0.00%> (+0.39%)` | :arrow_up: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.23% <0.00%> (+0.47%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=footer). Last update [10f4ada...474e14a](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r565030032



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       I have been going back and forth on this one. So basically what I am trying to avoid is
   
   ```sql
   CASE
       WHEN true THEN (col1 = true)
       ELSE 1
   END; 
   ```
   
   Being optimized into:
   
   ```sql
   CASE
       WHEN true THEN col1
       ELSE 1
   END; 
   ```
   
   This is not a valid optimization when col1 column is not typed as boolean. Although we currently don't support comparison between boolean type and none boolean type in datafusion, i am expecting us to support it in the future. Am I overthinking this?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-779469678


   all feedback addressed, thanks for the suggestion @alamb , tests are a lot easier to read and maintain now compared to what I started with :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r563404985



##########
File path: rust/rustfmt.toml
##########
@@ -15,9 +15,10 @@
 # specific language governing permissions and limitations
 # under the License.
 
+edition = "2018"

Review comment:
       drive by config fix




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766461000


   https://issues.apache.org/jira/browse/ARROW-11366


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766471716


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=h1) Report
   > Merging [#9309](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=desc) (474e14a) into [master](https://codecov.io/gh/apache/arrow/commit/10f4ada2bbad0770e422e5d70071e991ab3f5f57?el=desc) (10f4ada) will **increase** coverage by `0.00%`.
   > The diff coverage is `81.30%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9309/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff            @@
   ##           master    #9309    +/-   ##
   ========================================
     Coverage   81.84%   81.85%            
   ========================================
     Files         215      216     +1     
     Lines       52949    53069   +120     
   ========================================
   + Hits        43335    43437   +102     
   - Misses       9614     9632    +18     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ust/datafusion/src/optimizer/boolean\_comparison.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvYm9vbGVhbl9jb21wYXJpc29uLnJz) | `78.50% <78.50%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.36% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.39% <100.00%> (+0.23%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `49.41% <0.00%> (+0.39%)` | :arrow_up: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.23% <0.00%> (+0.47%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=footer). Last update [10f4ada...474e14a](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-779454561


   let me push up another commit to incorporate the clean up suggestions


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r571577988



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,620 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` and `expr != false` to `expr` when `expr` is of boolean type

Review comment:
       ```suggestion
   /// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766471716


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=h1) Report
   > Merging [#9309](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=desc) (4318278) into [master](https://codecov.io/gh/apache/arrow/commit/88e9eb852755072b948393d99d08211c3e01ce38?el=desc) (88e9eb8) will **increase** coverage by `0.12%`.
   > The diff coverage is `95.21%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9309/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9309      +/-   ##
   ==========================================
   + Coverage   82.27%   82.40%   +0.12%     
   ==========================================
     Files         234      235       +1     
     Lines       54594    55094     +500     
   ==========================================
   + Hits        44919    45398     +479     
   - Misses       9675     9696      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `82.45% <82.14%> (-0.04%)` | :arrow_down: |
   | [rust/datafusion/src/optimizer/constant\_folding.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvY29uc3RhbnRfZm9sZGluZy5ycw==) | `95.86% <95.86%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `90.17% <100.00%> (+0.08%)` | :arrow_up: |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `83.23% <100.00%> (+0.07%)` | :arrow_up: |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `52.39% <0.00%> (+0.36%)` | :arrow_up: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/9309/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `80.66% <0.00%> (+0.47%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=footer). Last update [88e9eb8...4318278](https://codecov.io/gh/apache/arrow/pull/9309?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp edited a comment on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp edited a comment on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-766489488


   clippy is failing due to unrelated issue, can i fix it in the same PR or should I create a new dedicated one to address the issue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-778768852


   Thanks @houqp  -- I ran out of time today to review this PR, but I plan to review it tomorrow. I agree that a `ExprRewriter` refactor would be better done in a separate PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r564458819



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {

Review comment:
       :+1: Yes, agreed this ATM doesn't need a loop as there is no interaction yet with other rules. But I guess once you'll add them you will need it if you want to combine it with other rules. Using `expr != exprB` for `changed` might be a good way to start. The main thing I want to stress is that at some point we don't want the recursion itself in a rule, but in a more general "optimization framework".
   
   The optimization loop itself can be written in such a way it does both top down and bottom up replacements, applying the same rule while the optimization "generates" a replacement for the node, so it doesn't need multiple traversals for cases like you mention. The concept is here in polars
   https://github.com/ritchie46/polars/blob/master/polars/polars-lazy/src/logical_plan/optimizer/mod.rs#L69
   
   And sounds like a perfect candidate to try for `ExprRewriter` :+1: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r575724007



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       OK, I have updated the code to apply the folding optimization to all plan nodes and expressions wherever applicable. We currently doesn't do the `CAST` rewrite at the moment, so that needs to be handled in a separate ticket/PR to get full support for applying binary operators on expressions with different types. This PR now assumes automatic casting will be added in the future and skips optimization for expressions that are not in boolean type.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r566379027



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {

Review comment:
       yeah, totally, agree, i was really tempting to introduce an optimization framework while writing those tree traversal boilerplate code, but decided it's probably better be done as a separate refactoring so we can get a better feeling of how the new framework can affect all the existing rules. the way we are currently managing optimization rules is definitely too raw ;) 
   
   I will take a look at the Expression Visitor pattern as well since that's something already exists in the code base.

##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {

Review comment:
       yeah, totally, agree with both of you, i was really tempting to introduce an optimization framework while writing those tree traversal boilerplate code, but decided it's probably better be done as a separate refactoring so we can get a better feeling of how the new framework can affect all the existing rules. the way we are currently managing optimization rules is definitely too raw ;) 
   
   I will take a look at the Expression Visitor pattern as well since that's something already exists in the code base.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r576429023



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,834 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = true` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+/// * `!!expr` to `expr`
+/// * `expr = null` and `expr != null` to `null`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, &plan.all_schemas())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Join { .. } => {
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                let schemas = plan.all_schemas();
+                let expr = utils::expressions(plan)
+                    .iter()
+                    .map(|e| optimize_expr(e, &schemas))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+fn is_boolean_type(expr: &Expr, schemas: &[&DFSchemaRef]) -> bool {
+    for schema in schemas {
+        match expr.get_type(schema) {
+            Ok(dt) if dt == DataType::Boolean => {
+                return true;
+            }
+            _ => {
+                continue;

Review comment:
       Good call! I have also added more comments to explain why `all_schemas` method was needed in the optimizer code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r571580430



##########
File path: rust/datafusion/src/optimizer/constant_folding.rs
##########
@@ -0,0 +1,620 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use arrow::datatypes::DataType;
+
+use crate::error::Result;
+use crate::logical_plan::{DFSchemaRef, Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` and `expr != false` to `expr` when `expr` is of boolean type
+/// * `expr = false` and `expr != true` to `!expr` when `expr` is of boolean type
+/// * `true = true` and `false = false` to `true`
+/// * `false = true` and `true = false` to `false`
+pub struct ConstantFolding {}
+
+impl ConstantFolding {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for ConstantFolding {
+    fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate, plan.schema())?,
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "constant_folding"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr, schema: &DFSchemaRef) -> Result<Expr> {
+    Ok(match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left, schema)?;
+            let right = optimize_expr(right, schema)?;
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) == r.unwrap_or(false),
+                    ))),
+                    (Expr::Literal(ScalarValue::Boolean(b)), _)
+                        if right.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => right,
+                            Some(false) | None => Expr::Not(Box::new(right)),
+                        }
+                    }
+                    (_, Expr::Literal(ScalarValue::Boolean(b)))
+                        if left.get_type(schema)? == DataType::Boolean =>
+                    {
+                        match b {
+                            Some(true) => left,
+                            Some(false) | None => Expr::Not(Box::new(left)),
+                        }
+                    }
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (
+                        Expr::Literal(ScalarValue::Boolean(l)),
+                        Expr::Literal(ScalarValue::Boolean(r)),
+                    ) => Expr::Literal(ScalarValue::Boolean(Some(
+                        l.unwrap_or(false) != r.unwrap_or(false),

Review comment:
       What about if both operands are `NULL` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#issuecomment-778694710


   @Dandandan @alamb ready for another round of review. Given this PR has grown into 900 lines, I think it would be better to work on a new `ExprRewriter` a refactor only PR to ease the review process. We will be able to get a better idea of how the new framework works by applying it to multiple optimization rules in the same patch set.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9309: ARROW-11366: [Datafusion] Implement constant folding for boolean literal expressions

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9309:
URL: https://github.com/apache/arrow/pull/9309


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r565030032



##########
File path: rust/datafusion/src/optimizer/boolean_comparison.rs
##########
@@ -0,0 +1,270 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Boolean comparision rule rewrites redudant comparison expression involing boolean literal into
+//! unary expression.
+
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{Expr, LogicalPlan, Operator};
+use crate::optimizer::optimizer::OptimizerRule;
+use crate::optimizer::utils;
+use crate::scalar::ScalarValue;
+
+/// Optimizer that simplifies comparison expressions involving boolean literals.
+///
+/// Recursively go through all expressionss and simplify the following cases:
+/// * `expr = ture` to `expr`
+/// * `expr = false` to `!expr`
+pub struct BooleanComparison {}
+
+impl BooleanComparison {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+impl OptimizerRule for BooleanComparison {
+    fn optimize(&mut self, plan: &LogicalPlan) -> Result<LogicalPlan> {
+        match plan {
+            LogicalPlan::Filter { predicate, input } => Ok(LogicalPlan::Filter {
+                predicate: optimize_expr(predicate),
+                input: Arc::new(self.optimize(input)?),
+            }),
+            // Rest: recurse into plan, apply optimization where possible
+            LogicalPlan::Projection { .. }
+            | LogicalPlan::Aggregate { .. }
+            | LogicalPlan::Limit { .. }
+            | LogicalPlan::Repartition { .. }
+            | LogicalPlan::CreateExternalTable { .. }
+            | LogicalPlan::Extension { .. }
+            | LogicalPlan::Sort { .. }
+            | LogicalPlan::Explain { .. }
+            | LogicalPlan::Join { .. } => {
+                let expr = utils::expressions(plan);
+
+                // apply the optimization to all inputs of the plan
+                let inputs = utils::inputs(plan);
+                let new_inputs = inputs
+                    .iter()
+                    .map(|plan| self.optimize(plan))
+                    .collect::<Result<Vec<_>>>()?;
+
+                utils::from_plan(plan, &expr, &new_inputs)
+            }
+            LogicalPlan::TableScan { .. } | LogicalPlan::EmptyRelation { .. } => {
+                Ok(plan.clone())
+            }
+        }
+    }
+
+    fn name(&self) -> &str {
+        "boolean_comparison"
+    }
+}
+
+/// Recursively transverses the logical plan.
+fn optimize_expr(e: &Expr) -> Expr {
+    match e {
+        Expr::BinaryExpr { left, op, right } => {
+            let left = optimize_expr(left);
+            let right = optimize_expr(right);
+            match op {
+                Operator::Eq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(true) => right,
+                        Some(false) | None => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(true) => left,
+                        Some(false) | None => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::Eq,
+                        right: Box::new(right),
+                    },
+                },
+                Operator::NotEq => match (&left, &right) {
+                    (Expr::Literal(ScalarValue::Boolean(b)), _) => match b {
+                        Some(false) | None => right,
+                        Some(true) => Expr::Not(Box::new(right)),
+                    },
+                    (_, Expr::Literal(ScalarValue::Boolean(b))) => match b {
+                        Some(false) | None => left,
+                        Some(true) => Expr::Not(Box::new(left)),
+                    },
+                    _ => Expr::BinaryExpr {
+                        left: Box::new(left),
+                        op: Operator::NotEq,
+                        right: Box::new(right),
+                    },
+                },
+                _ => Expr::BinaryExpr {
+                    left: Box::new(left),
+                    op: op.clone(),
+                    right: Box::new(right),
+                },
+            }
+        }
+        Expr::Not(expr) => Expr::Not(Box::new(optimize_expr(&expr))),
+        Expr::Case {
+            expr,
+            when_then_expr,
+            else_expr,
+        } => {
+            if expr.is_none() {
+                // recurse into CASE WHEN condition expressions
+                Expr::Case {
+                    expr: None,
+                    when_then_expr: when_then_expr
+                        .iter()
+                        .map(|(when, then)| (Box::new(optimize_expr(when)), then.clone()))
+                        .collect(),
+                    else_expr: else_expr.clone(),

Review comment:
       I have been going back and forth on this one. So basically what I am trying to avoid is
   
   ```sql
   CASE
       WHEN true THEN (col1 = true)
       ELSE (col1 = false)
   END; 
   ```
   
   Being optimized into:
   
   ```sql
   CASE
       WHEN true THEN col1
       ELSE !col1
   END; 
   ```
   
   This is not a valid optimization when col1 column is not typed as boolean. Although we currently don't support comparison between boolean type and none boolean type in datafusion, i am expecting us to support it in the future. Am I overthinking this?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r563404985



##########
File path: rust/rustfmt.toml
##########
@@ -15,9 +15,10 @@
 # specific language governing permissions and limitations
 # under the License.
 
+edition = "2018"

Review comment:
       drive by config fix




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #9309: ARROW-11366: [Datafusion] support boolean literal in comparison expression

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #9309:
URL: https://github.com/apache/arrow/pull/9309#discussion_r576419903



##########
File path: dev/archery/archery/benchmark/compare.py
##########
@@ -25,11 +25,11 @@ def items_per_seconds_fmt(value):
     if value < 1000:
         return "{} items/sec".format(value)
     if value < 1000**2:
-        return "{:.3f}k items/sec".format(value / 1000)
+        return "{:.3f}K items/sec".format(value / 1000)

Review comment:
       probably caused by force push from master, i will rebase.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org