You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/16 18:19:54 UTC

[GitHub] [arrow-datafusion] jackwener opened a new pull request #2026: merge adjacent filter rule for optimizer

jackwener opened a new pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026


   # Which issue does this PR close?
   
   Closes #2016 .
   
   ```sql
   explain select c1, c2 from test where c3 = true and c2 = 0.000001;
   ```
   
   Before
   ```sql
   +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                                |
   +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #test.c1, #test.c2                                                                                                      |
   |               |   Filter: #test.c3                                                                                                                  |
   |               |     Filter: #test.c2 = Float64(0.000001)                                                                                            |
   |               |       TableScan: test projection=Some([0, 1, 2]), filters=[#test.c3, #test.c2 = Float64(0.000001)]                                  |
   | physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                       |
   |               |   CoalesceBatchesExec: target_batch_size=4096                                                                                       |
   |               |     FilterExec: c3@2                                                                                                                |
   |               |       CoalesceBatchesExec: target_batch_size=4096                                                                                   |
   |               |         FilterExec: c2@1 = 0.000001                                                                                                 |
   |               |           RepartitionExec: partitioning=RoundRobinBatch(8)                                                                          |
   |               |             CsvExec: files=[/home/jakevin/code/arrow-datafusion/datafusion/tests/aggregate_simple.csv], has_header=true, limit=None |
   |               |                                                                                                                                     |
   +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   After
   ```sql
   +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                            |
   +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #test.c1, #test.c2                                                                                                  |
   |               |   Filter: #test.c3 AND #test.c2 = Float64(0.000001)                                                                             |
   |               |     TableScan: test projection=Some([0, 1, 2]), filters=[#test.c3, #test.c2 = Float64(0.000001)]                                |
   | physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                   |
   |               |   CoalesceBatchesExec: target_batch_size=4096                                                                                   |
   |               |     FilterExec: c3@2 AND c2@1 = 0.000001                                                                                        |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(8)                                                                          |
   |               |         CsvExec: files=[/home/jakevin/code/arrow-datafusion/datafusion/tests/aggregate_simple.csv], has_header=true, limit=None |
   |               |                                                                                                                                 |
   +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   
   ```
   
    # Rationale for this change
   
   merge adjacent filter 
   
   # Are there any user-facing changes?
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1073212122


   For anyone following along, follow on PR is https://github.com/apache/arrow-datafusion/pull/2039  https://github.com/apache/arrow-datafusion/issues/2038


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072983873


   The following is my test code used by sql.
   
   ```rust
   #[tokio::test]
   async fn main() -> Result<()> {
       // create local execution context
       let mut ctx = SessionContext::new();
   
       // register csv file with the execution context
       ctx.register_csv("test", "tests/aggregate_simple.csv", CsvReadOptions::new())
           .await?;
   
       // execute the query
       let plan = ctx.create_logical_plan(
           "select c1, c2 from test where c3 = true and c2 = 0.000001",
       )?;
   
       dbg!(plan);
   
       Ok(())
   }
   ```
   Then I got the plan
   ```shell
   Projection: #test.c1, #test.c2
     Filter: #test.c3 = Boolean(true) AND #test.c2 = Float64(0.000001)
       TableScan: test projection=Non
   ```
   
   I also tested use datafusion-cli:
   ```sql
   ❯ create table t as SELECT * FROM (VALUES (1,true), (2,false)) as t;
   0 rows in set. Query took 0.003 seconds.
   ❯ select * from t;
   +---------+---------+
   | column1 | column2 |
   +---------+---------+
   | 1       | true    |
   | 2       | false   |
   +---------+---------+
   2 rows in set. Query took 0.002 seconds.
   ❯ explain select * from t where column1 = 2 and column2 = true;
   +---------------+-------------------------------------------------------------------+
   | plan_type     | plan                                                              |
   +---------------+-------------------------------------------------------------------+
   | logical_plan  | Projection: #t.column1, #t.column2                                |
   |               |   Filter: #t.column1 = Int64(2) AND #t.column2                    |
   |               |     TableScan: t projection=Some([0, 1])                          |
   | physical_plan | ProjectionExec: expr=[column1@0 as column1, column2@1 as column2] |
   |               |   CoalesceBatchesExec: target_batch_size=4096                     |
   |               |     FilterExec: column1@0 = 2 AND column2@1                       |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(12)           |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]             |
   |               |                                                                   |
   +---------------+-------------------------------------------------------------------+
   2 rows in set. Query took 0.004 seconds.
   ```
   
   Two cases will result in adjacent filters in logical plan:
   - Use dataframe: `df.xxx.filter(filter1).filter(filter2)`;
   - Directly build logical plan by `LogicalPlanBuilder`: `LogicalPlanBuilder::from(xx).xxx.filter().filter()...` 
   
   Btw: I also checked pg & cockroach & materialize codebase, they don't have the rule.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on a change in pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener commented on a change in pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#discussion_r829666781



##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },

Review comment:
       Yes, it's great point. But it should be other rule function.
   
   Now there's `eliminate_fliter`. After this PR, I plan to do enhance this rule for conjunction filter and eliminate more redundant expr after this .

##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },

Review comment:
       It's great point. But it should be other rule function.
   
   Now there's `eliminate_fliter`. After this PR, I plan to do enhance this rule for conjunction filter and eliminate more redundant expr after this .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#discussion_r828719675



##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },
+                    input: Arc::new((*sub_input).clone()),
+                }),
+                _ => plan.clone(),
+            }
+        }
+        _ => plan.clone(),
+    }
+}
+
+impl OptimizerRule for MergeAdjacentFilter {
+    fn optimize(
+        &self,
+        plan: &LogicalPlan,
+        execution_props: &ExecutionProps,
+    ) -> Result<LogicalPlan> {
+        let new_plan = optimize(plan);
+
+        // Apply the optimization to all inputs of the plan
+        let inputs = &new_plan.inputs();

Review comment:
       The `&` is redundant?

##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },
+                    input: Arc::new((*sub_input).clone()),
+                }),
+                _ => plan.clone(),
+            }
+        }
+        _ => plan.clone(),
+    }
+}
+
+impl OptimizerRule for MergeAdjacentFilter {
+    fn optimize(
+        &self,
+        plan: &LogicalPlan,
+        execution_props: &ExecutionProps,
+    ) -> Result<LogicalPlan> {
+        let new_plan = optimize(plan);
+
+        // Apply the optimization to all inputs of the plan
+        let inputs = &new_plan.inputs();
+        let new_inputs = inputs
+            .iter()
+            .map(|&new_plan| self.optimize(new_plan, execution_props))
+            .collect::<Result<Vec<_>>>()?;
+
+        utils::from_plan(&new_plan, &new_plan.expressions(), &new_inputs)
+    }
+
+    fn name(&self) -> &str {
+        "merge_adjacent_filter"
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use datafusion_common::ScalarValue;
+    use datafusion_expr::lit;
+
+    use super::*;
+    use crate::logical_plan::col;
+    use crate::logical_plan::LogicalPlanBuilder;
+    use crate::test::*;
+
+    fn assert_optimized_plan_eq(plan: &LogicalPlan, expected: &str) {
+        let rule = MergeAdjacentFilter::new();
+        let optimized_plan = rule
+            .optimize(plan, &ExecutionProps::new())
+            .expect("failed to optimize plan");
+        let formatted_plan = format!("{:?}", optimized_plan);
+        assert_eq!(formatted_plan, expected);
+        assert_eq!(plan.schema(), optimized_plan.schema());
+    }
+
+    #[test]
+    fn double_filter() {
+        let table_scan = test_table_scan().unwrap();
+        let plan =
+            LogicalPlanBuilder::from(table_scan.clone())
+                .project(vec![col("a")])
+                .unwrap()
+                .filter(col("a").eq(lit(1_i32)))
+                .unwrap()
+                .filter(col("a").eq(datafusion_expr::Expr::Literal(
+                    ScalarValue::Boolean(Some(true)),
+                )))
+                .unwrap()
+                .build()
+                .unwrap();
+
+        let expected = "Filter: #test.a = Boolean(true) AND #test.a = Int32(1)\
+            \n  Projection: #test.a\
+            \n    TableScan: test projection=None";
+        assert_optimized_plan_eq(&plan, expected);
+    }
+
+    #[test]
+    fn thrice_filter() {
+        let table_scan = test_table_scan().unwrap();
+        let plan =
+            LogicalPlanBuilder::from(table_scan.clone())

Review comment:
       ```suggestion
               LogicalPlanBuilder::from(table_scan)
   ```

##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },
+                    input: Arc::new((*sub_input).clone()),
+                }),
+                _ => plan.clone(),
+            }
+        }
+        _ => plan.clone(),
+    }
+}
+
+impl OptimizerRule for MergeAdjacentFilter {
+    fn optimize(
+        &self,
+        plan: &LogicalPlan,
+        execution_props: &ExecutionProps,
+    ) -> Result<LogicalPlan> {
+        let new_plan = optimize(plan);
+
+        // Apply the optimization to all inputs of the plan
+        let inputs = &new_plan.inputs();
+        let new_inputs = inputs
+            .iter()
+            .map(|&new_plan| self.optimize(new_plan, execution_props))
+            .collect::<Result<Vec<_>>>()?;
+
+        utils::from_plan(&new_plan, &new_plan.expressions(), &new_inputs)
+    }
+
+    fn name(&self) -> &str {
+        "merge_adjacent_filter"
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use datafusion_common::ScalarValue;
+    use datafusion_expr::lit;
+
+    use super::*;
+    use crate::logical_plan::col;
+    use crate::logical_plan::LogicalPlanBuilder;
+    use crate::test::*;
+
+    fn assert_optimized_plan_eq(plan: &LogicalPlan, expected: &str) {
+        let rule = MergeAdjacentFilter::new();
+        let optimized_plan = rule
+            .optimize(plan, &ExecutionProps::new())
+            .expect("failed to optimize plan");
+        let formatted_plan = format!("{:?}", optimized_plan);
+        assert_eq!(formatted_plan, expected);
+        assert_eq!(plan.schema(), optimized_plan.schema());
+    }
+
+    #[test]
+    fn double_filter() {
+        let table_scan = test_table_scan().unwrap();
+        let plan =
+            LogicalPlanBuilder::from(table_scan.clone())
+                .project(vec![col("a")])
+                .unwrap()
+                .filter(col("a").eq(lit(1_i32)))
+                .unwrap()
+                .filter(col("a").eq(datafusion_expr::Expr::Literal(
+                    ScalarValue::Boolean(Some(true)),
+                )))
+                .unwrap()
+                .build()
+                .unwrap();
+
+        let expected = "Filter: #test.a = Boolean(true) AND #test.a = Int32(1)\
+            \n  Projection: #test.a\
+            \n    TableScan: test projection=None";
+        assert_optimized_plan_eq(&plan, expected);
+    }
+
+    #[test]
+    fn thrice_filter() {
+        let table_scan = test_table_scan().unwrap();
+        let plan =
+            LogicalPlanBuilder::from(table_scan.clone())
+                .project(vec![col("a")])
+                .unwrap()
+                .filter(col("a").eq(lit(1_i32)))
+                .unwrap()
+                .filter(col("a").eq(lit(2_i32)))
+                .unwrap()
+                .filter(col("a").eq(datafusion_expr::Expr::Literal(
+                    ScalarValue::Boolean(Some(true)),
+                )))
+                .unwrap()
+                .build()
+                .unwrap();
+
+        let expected = "Filter: #test.a = Boolean(true) AND #test.a = Int32(2) AND #test.a = Int32(1)\
+            \n  Projection: #test.a\
+            \n    TableScan: test projection=None";
+        assert_optimized_plan_eq(&plan, expected);
+    }
+
+    #[test]
+    fn nested_double_filter() {
+        let table_scan = test_table_scan().unwrap();
+        let plan =
+            LogicalPlanBuilder::from(table_scan.clone())

Review comment:
       ```suggestion
               LogicalPlanBuilder::from(table_scan)
   ```

##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },
+                    input: Arc::new((*sub_input).clone()),
+                }),
+                _ => plan.clone(),
+            }
+        }
+        _ => plan.clone(),
+    }
+}
+
+impl OptimizerRule for MergeAdjacentFilter {
+    fn optimize(
+        &self,
+        plan: &LogicalPlan,
+        execution_props: &ExecutionProps,
+    ) -> Result<LogicalPlan> {
+        let new_plan = optimize(plan);
+
+        // Apply the optimization to all inputs of the plan
+        let inputs = &new_plan.inputs();
+        let new_inputs = inputs
+            .iter()
+            .map(|&new_plan| self.optimize(new_plan, execution_props))
+            .collect::<Result<Vec<_>>>()?;
+
+        utils::from_plan(&new_plan, &new_plan.expressions(), &new_inputs)
+    }
+
+    fn name(&self) -> &str {
+        "merge_adjacent_filter"
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use datafusion_common::ScalarValue;
+    use datafusion_expr::lit;
+
+    use super::*;
+    use crate::logical_plan::col;
+    use crate::logical_plan::LogicalPlanBuilder;
+    use crate::test::*;
+
+    fn assert_optimized_plan_eq(plan: &LogicalPlan, expected: &str) {
+        let rule = MergeAdjacentFilter::new();
+        let optimized_plan = rule
+            .optimize(plan, &ExecutionProps::new())
+            .expect("failed to optimize plan");
+        let formatted_plan = format!("{:?}", optimized_plan);
+        assert_eq!(formatted_plan, expected);
+        assert_eq!(plan.schema(), optimized_plan.schema());
+    }
+
+    #[test]
+    fn double_filter() {
+        let table_scan = test_table_scan().unwrap();
+        let plan =
+            LogicalPlanBuilder::from(table_scan.clone())

Review comment:
       ```suggestion
               LogicalPlanBuilder::from(table_scan)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 removed a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
liukun4515 removed a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1070196247


   > # Which issue does this PR close?
   > Closes #2016 .
   > 
   > ```sql
   > explain select c1, c2 from test where c3 = true and c2 = 0.000001;
   > ```
   > 
   > Before
   > 
   > ```sql
   > +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   > | plan_type     | plan                                                                                                                                |
   > +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   > | logical_plan  | Projection: #test.c1, #test.c2                                                                                                      |
   > |               |   Filter: #test.c3                                                                                                                  |
   > |               |     Filter: #test.c2 = Float64(0.000001)                                                                                            |
   > |               |       TableScan: test projection=Some([0, 1, 2]), filters=[#test.c3, #test.c2 = Float64(0.000001)]                                  |
   > | physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                       |
   > |               |   CoalesceBatchesExec: target_batch_size=4096                                                                                       |
   > |               |     FilterExec: c3@2                                                                                                                |
   > |               |       CoalesceBatchesExec: target_batch_size=4096                                                                                   |
   > |               |         FilterExec: c2@1 = 0.000001                                                                                                 |
   > |               |           RepartitionExec: partitioning=RoundRobinBatch(8)                                                                          |
   > |               |             CsvExec: files=[/home/jakevin/code/arrow-datafusion/datafusion/tests/aggregate_simple.csv], has_header=true, limit=None |
   > |               |                                                                                                                                     |
   > +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   > ```
   > 
   > After
   > 
   > ```sql
   > +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   > | plan_type     | plan                                                                                                                            |
   > +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   > | logical_plan  | Projection: #test.c1, #test.c2                                                                                                  |
   > |               |   Filter: #test.c3 AND #test.c2 = Float64(0.000001)                                                                             |
   > |               |     TableScan: test projection=Some([0, 1, 2]), filters=[#test.c3, #test.c2 = Float64(0.000001)]                                |
   > | physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                   |
   > |               |   CoalesceBatchesExec: target_batch_size=4096                                                                                   |
   > |               |     FilterExec: c3@2 AND c2@1 = 0.000001                                                                                        |
   > |               |       RepartitionExec: partitioning=RoundRobinBatch(8)                                                                          |
   > |               |         CsvExec: files=[/home/jakevin/code/arrow-datafusion/datafusion/tests/aggregate_simple.csv], has_header=true, limit=None |
   > |               |                                                                                                                                 |
   > +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   > ```
   > 
   > # Rationale for this change
   > merge adjacent filter
   > 
   > # Are there any user-facing changes?
   > None
   
   Could you please provide this schema for this table `test`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069901788


   In many time, It doesn't have a big impact on performance. But it can simplify the query plan.
   
   What' more, other plan can benefit from it, like filter reorder, because we didn't traverse the plan node instead of just focus this filter operator. Because the adjacent filter order is exchangeable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
yjshen commented on a change in pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#discussion_r828734369



##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.

Review comment:
       We should update this.

##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },

Review comment:
       We might also need to remove redundant conditions while combining filters. For instance, `filter(a == 1 && b == 1).filter(a == 1 && c == 1)` could be further optimized to `(a == 1 && b == 1 && c == 1)` from `(a == 1 && b == 1) && (a == 1 && c == 1)`

##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;

Review comment:
       CombineFilters might be a more suitable name. Besides, a plan could be changed so not adjacent at first.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072983873


   The following is my test code used by sql.
   
   ```rust
   #[tokio::test]
   async fn main() -> Result<()> {
       // create local execution context
       let mut ctx = SessionContext::new();
   
       // register csv file with the execution context
       ctx.register_csv("test", "tests/aggregate_simple.csv", CsvReadOptions::new())
           .await?;
   
       // execute the query
       let plan = ctx.create_logical_plan(
           "select c1, c2 from test where c3 = true and c2 = 0.000001",
       )?;
   
       dbg!(plan);
   
       Ok(())
   }
   ```
   Then I got the plan
   ```shell
   Projection: #test.c1, #test.c2
     Filter: #test.c3 = Boolean(true) AND #test.c2 = Float64(0.000001)
       TableScan: test projection=Non
   ```
   
   I also tested use datafusion-cli:
   ```sql
   ❯ create table t as SELECT * FROM (VALUES (1,true), (2,false)) as t;
   0 rows in set. Query took 0.003 seconds.
   ❯ select * from t;
   +---------+---------+
   | column1 | column2 |
   +---------+---------+
   | 1       | true    |
   | 2       | false   |
   +---------+---------+
   2 rows in set. Query took 0.002 seconds.
   ❯ explain select * from t where column1 = 2 and column2 = true;
   +---------------+-------------------------------------------------------------------+
   | plan_type     | plan                                                              |
   +---------------+-------------------------------------------------------------------+
   | logical_plan  | Projection: #t.column1, #t.column2                                |
   |               |   Filter: #t.column1 = Int64(2) AND #t.column2                    |
   |               |     TableScan: t projection=Some([0, 1])                          |
   | physical_plan | ProjectionExec: expr=[column1@0 as column1, column2@1 as column2] |
   |               |   CoalesceBatchesExec: target_batch_size=4096                     |
   |               |     FilterExec: column1@0 = 2 AND column2@1                       |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(12)           |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]             |
   |               |                                                                   |
   +---------------+-------------------------------------------------------------------+
   2 rows in set. Query took 0.004 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072983873


   The following is my test code used by sql.
   
   ```rust
   #[tokio::test]
   async fn main() -> Result<()> {
       // create local execution context
       let mut ctx = SessionContext::new();
   
       // register csv file with the execution context
       ctx.register_csv("test", "tests/aggregate_simple.csv", CsvReadOptions::new())
           .await?;
   
       // execute the query
       let plan = ctx.create_logical_plan(
           "select c1, c2 from test where c3 = true and c2 = 0.000001",
       )?;
   
       dbg!(plan);
   
       Ok(())
   }
   ```
   Then I got the plan
   ```shell
   Projection: #test.c1, #test.c2
     Filter: #test.c3 = Boolean(true) AND #test.c2 = Float64(0.000001)
       TableScan: test projection=Non
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
doki23 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072928594


   Sorry for that, but in my superficial opinion, we should make the sql planner produce one filter when creating the logical plan instead of adding an optimized rule. In other words, I think may be there is a bug in the planner?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener edited a comment on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069901788


   In many time, It doesn't have a big impact on performance. But it can simplify the query plan.
   
   What' more, other plan can benefit from it, like filter reorder, because we don't need to traverse the plan node instead of just focus this filter operator. Because the adjacent filter order is exchangeable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener edited a comment on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069890702


   > In fact, I doubt if it really has gains, could you please do some benchmarks for the case? Thanks @jackwener
   
   Let me explain about this rule. It's a common and base optimizer rule. Postgresql, mysql, cockroach all implement this rule.
   
   like pg:
   
   ```sql
   explain select id, firstname from scientist where id > 3 and lastname = 's';
   
   QUERY PLAN
   --
   Seq Scan on scientist (cost=0.00..17.20 rows=1 width=72)
   Filter: ((id > 3) AND ((lastname)::text = 's'::text))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener edited a comment on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069373469


   @alamb @Dandandan @houqp PTAL❤😃.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069373469


   @alamb @Dandandan @houqp PTAL❤😃.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 commented on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
doki23 commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072928594


   Sorry for that, but in my opinion, we should make the sql planner produce only one filter when creating the logical plan instead of creating an optimized rule.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069911561


   This postgresql example is more directly. 
   
   ```sql
   explain select id, firstname from 
   (select id, firstname, lastname from scientist where id > 3 ) t
    where lastname = 's';
   
   QUERY PLAN
   --
   Seq Scan on scientist (cost=0.00..17.20 rows=1 width=72)
   Filter: ((id > 3) AND ((lastname)::text = 's'::text))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
yjshen commented on a change in pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#discussion_r828738961



##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;

Review comment:
       CombineFilters might be a more suitable name. A plan could be changed so filters are not adjacent at first.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069890702


   > In fact, I doubt if it really has gains, could you please do some benchmarks for the case? Thanks @jackwener
   
   Let me explain about this rule. It's a common and base optimizer rule. Postgresql, ,m ysql, cockroach all implement this rule.
   
   like pg:
   
   ```sql
   explain select id, firstname from scientist where id > 3 and lastname = 's';
   
   QUERY PLAN
   --
   Seq Scan on scientist (cost=0.00..17.20 rows=1 width=72)
   Filter: ((id > 3) AND ((lastname)::text = 's'::text))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072983873


   The following is my test code used by sql.
   
   ```rust
   #[tokio::test]
   async fn main() -> Result<()> {
       // create local execution context
       let mut ctx = SessionContext::new();
   
       // register csv file with the execution context
       ctx.register_csv("test", "tests/aggregate_simple.csv", CsvReadOptions::new())
           .await?;
   
       // execute the query
       let plan = ctx.create_logical_plan(
           "select c1, c2 from test where c3 = true and c2 = 0.000001",
       )?;
   
       dbg!(plan);
   
       Ok(())
   }
   ```
   Then I got the plan
   ```shell
   Projection: #test.c1, #test.c2
     Filter: #test.c3 = Boolean(true) AND #test.c2 = Float64(0.000001)
       TableScan: test projection=Non
   ```
   
   I also tested use datafusion-cli:
   ```sql
   ❯ create table t as SELECT * FROM (VALUES (1,true), (2,false)) as t;
   0 rows in set. Query took 0.003 seconds.
   ❯ select * from t;
   +---------+---------+
   | column1 | column2 |
   +---------+---------+
   | 1       | true    |
   | 2       | false   |
   +---------+---------+
   2 rows in set. Query took 0.002 seconds.
   ❯ explain select * from t where column1 = 2 and column2 = true;
   +---------------+-------------------------------------------------------------------+
   | plan_type     | plan                                                              |
   +---------------+-------------------------------------------------------------------+
   | logical_plan  | Projection: #t.column1, #t.column2                                |
   |               |   Filter: #t.column1 = Int64(2) AND #t.column2                    |
   |               |     TableScan: t projection=Some([0, 1])                          |
   | physical_plan | ProjectionExec: expr=[column1@0 as column1, column2@1 as column2] |
   |               |   CoalesceBatchesExec: target_batch_size=4096                     |
   |               |     FilterExec: column1@0 = 2 AND column2@1                       |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(12)           |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]             |
   |               |                                                                   |
   +---------------+-------------------------------------------------------------------+
   2 rows in set. Query took 0.004 seconds.
   ```
   
   Two cases will result in adjacent filters in logical plan:
   - Use dataframe: `df.xxx.filter(filter1).filter(filter2)`;
   - Directly build logical plan by `LogicalPlanBuilder`: `LogicalPlanBuilder::from(xx).xxx.filter().filter()...` 
   
   For dataframe users, they can use the following to avoid adjacent filters
   ```rust
       let filter1 = col("b").eq(lit(10));
   
       let filter2 = col("a").eq(lit("a"));
       
       let filter = filter1.and(filter2);
   
       let df = df
           .select_columns(&["a", "b"])?
           .filter(filter)?;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
doki23 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072928594


   Sorry for that, but in my superficial opinion, we should make the sql planner produce one filter when creating the logical plan instead of adding an optimized rule. In other words, I think may there is a bug in the planner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069966867


   Make sense to me, thank you @jackwener 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072983873


   The following is my test code used by sql.
   
   ```rust
   #[tokio::test]
   async fn main() -> Result<()> {
       // create local execution context
       let mut ctx = SessionContext::new();
   
       // register csv file with the execution context
       ctx.register_csv("test", "tests/aggregate_simple.csv", CsvReadOptions::new())
           .await?;
   
       // execute the query
       let plan = ctx.create_logical_plan(
           "select c1, c2 from test where c3 = true and c2 = 0.000001",
       )?;
   
       dbg!(plan);
   
       Ok(())
   }
   ```
   Then I got the plan
   ```shell
   Projection: #test.c1, #test.c2
     Filter: #test.c3 = Boolean(true) AND #test.c2 = Float64(0.000001)
       TableScan: test projection=Non
   ```
   
   I also tested use datafusion-cli:
   ```sql
   ❯ create table t as SELECT * FROM (VALUES (1,true), (2,false)) as t;
   0 rows in set. Query took 0.003 seconds.
   ❯ select * from t;
   +---------+---------+
   | column1 | column2 |
   +---------+---------+
   | 1       | true    |
   | 2       | false   |
   +---------+---------+
   2 rows in set. Query took 0.002 seconds.
   ❯ explain select * from t where column1 = 2 and column2 = true;
   +---------------+-------------------------------------------------------------------+
   | plan_type     | plan                                                              |
   +---------------+-------------------------------------------------------------------+
   | logical_plan  | Projection: #t.column1, #t.column2                                |
   |               |   Filter: #t.column1 = Int64(2) AND #t.column2                    |
   |               |     TableScan: t projection=Some([0, 1])                          |
   | physical_plan | ProjectionExec: expr=[column1@0 as column1, column2@1 as column2] |
   |               |   CoalesceBatchesExec: target_batch_size=4096                     |
   |               |     FilterExec: column1@0 = 2 AND column2@1                       |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(12)           |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]             |
   |               |                                                                   |
   +---------------+-------------------------------------------------------------------+
   2 rows in set. Query took 0.004 seconds.
   ```
   
   Two cases will result in adjacent filters in logical plan:
   - Use dataframe: `df.xxx.filter(filter1).filter(filter2)`;
   - Directly build logical plan by `LogicalPlanBuilder`: `LogicalPlanBuilder::from(xx).xxx.filter().filter()...` 
   
   For dataframe users, they can use the following to avoid adjacent filters
   ```rust
       let filter1 = col("b").eq(lit(10));
   
       let filter2 = col("a").eq(lit("a"));
       
       let filter = filter1.and(filter2);
   
       let df = df
           .select_columns(&["a", "b"])?
           .filter(filter)?;
   ```
   Btw: I also checked pg & cockroach & materialize codebase, they don't have the rule.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener closed pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener closed pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on a change in pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener commented on a change in pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#discussion_r829666781



##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },

Review comment:
       Yes, it's great point. But it should be other rule function.
   
   Now there's `eliminate_fliter`. After this PR, I plan to do enhance this rule for conjunction filter and eliminate more redundant expr after this .

##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },

Review comment:
       It's great point. But it should be other rule function.
   
   Now there's `eliminate_fliter`. After this PR, I plan to do enhance this rule for conjunction filter and eliminate more redundant expr after this .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
doki23 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072928594


   Sorry for that, but in my superficial opinion, we should make the sql planner produce only one filter when creating the logical plan instead of adding an optimized rule.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
doki23 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072928594


   In my superficial opinion, we should make the sql planner produce one filter when creating the logical plan instead of adding an optimized rule. In other words, I think may there is a bug in the planner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#discussion_r830579298



##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },

Review comment:
       FWIW at least some redundant expressions are already removed by the rule in https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/simplify_expressions.rs
   
   ```sql
   ❯ create table foo as select * from (values (1, 2, 3), (4, 5, 6), (7, 8, 9)) as sq;
   ...
   -- Note the rendant `column1 = column2` filter is removed in the actual plan
   ❯ explain select * from foo where (column1 = column2) AND (column1 = column3) AND (column1 = column2);
   +---------------+-----------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                    |
   +---------------+-----------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #foo.column1, #foo.column2, #foo.column3                                    |
   |               |   Filter: #foo.column1 = #foo.column2 AND #foo.column1 = #foo.column3                   |
   |               |     TableScan: foo projection=Some([0, 1, 2])                                           |
   | physical_plan | ProjectionExec: expr=[column1@0 as column1, column2@1 as column2, column3@2 as column3] |
   |               |   CoalesceBatchesExec: target_batch_size=4096                                           |
   |               |     FilterExec: column1@0 = column2@1 AND column1@0 = column3@2       <---- redundant filter is removed
   |               |       RepartitionExec: partitioning=RoundRobinBatch(16)                                 |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]                                   |
   |               |                                                                                         |
   +---------------+-----------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.005 seconds.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1070196247


   > # Which issue does this PR close?
   > Closes #2016 .
   > 
   > ```sql
   > explain select c1, c2 from test where c3 = true and c2 = 0.000001;
   > ```
   > 
   > Before
   > 
   > ```sql
   > +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   > | plan_type     | plan                                                                                                                                |
   > +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   > | logical_plan  | Projection: #test.c1, #test.c2                                                                                                      |
   > |               |   Filter: #test.c3                                                                                                                  |
   > |               |     Filter: #test.c2 = Float64(0.000001)                                                                                            |
   > |               |       TableScan: test projection=Some([0, 1, 2]), filters=[#test.c3, #test.c2 = Float64(0.000001)]                                  |
   > | physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                       |
   > |               |   CoalesceBatchesExec: target_batch_size=4096                                                                                       |
   > |               |     FilterExec: c3@2                                                                                                                |
   > |               |       CoalesceBatchesExec: target_batch_size=4096                                                                                   |
   > |               |         FilterExec: c2@1 = 0.000001                                                                                                 |
   > |               |           RepartitionExec: partitioning=RoundRobinBatch(8)                                                                          |
   > |               |             CsvExec: files=[/home/jakevin/code/arrow-datafusion/datafusion/tests/aggregate_simple.csv], has_header=true, limit=None |
   > |               |                                                                                                                                     |
   > +---------------+-------------------------------------------------------------------------------------------------------------------------------------+
   > ```
   > 
   > After
   > 
   > ```sql
   > +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   > | plan_type     | plan                                                                                                                            |
   > +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   > | logical_plan  | Projection: #test.c1, #test.c2                                                                                                  |
   > |               |   Filter: #test.c3 AND #test.c2 = Float64(0.000001)                                                                             |
   > |               |     TableScan: test projection=Some([0, 1, 2]), filters=[#test.c3, #test.c2 = Float64(0.000001)]                                |
   > | physical_plan | ProjectionExec: expr=[c1@0 as c1, c2@1 as c2]                                                                                   |
   > |               |   CoalesceBatchesExec: target_batch_size=4096                                                                                   |
   > |               |     FilterExec: c3@2 AND c2@1 = 0.000001                                                                                        |
   > |               |       RepartitionExec: partitioning=RoundRobinBatch(8)                                                                          |
   > |               |         CsvExec: files=[/home/jakevin/code/arrow-datafusion/datafusion/tests/aggregate_simple.csv], has_header=true, limit=None |
   > |               |                                                                                                                                 |
   > +---------------+---------------------------------------------------------------------------------------------------------------------------------+
   > ```
   > 
   > # Rationale for this change
   > merge adjacent filter
   > 
   > # Are there any user-facing changes?
   > None
   
   Could you please provide this schema for this table `test`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #2026: merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1069803394


   In fact, I doubt if it really has gains, could you please do some benchmarks for the case? Thanks @jackwener 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] doki23 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
doki23 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072928594


   Sorry for that, but in my superficial opinion, we should make the sql planner produce only one filter when creating the logical plan instead of creating an optimized rule.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1072983873


   The following is my test code used by sql.
   
   ```rust
   #[tokio::test]
   async fn main() -> Result<()> {
       // create local execution context
       let mut ctx = SessionContext::new();
   
       // register csv file with the execution context
       ctx.register_csv("test", "tests/aggregate_simple.csv", CsvReadOptions::new())
           .await?;
   
       // execute the query
       let plan = ctx.create_logical_plan(
           "select c1, c2 from test where c3 = true and c2 = 0.000001",
       )?;
   
       dbg!(plan);
   
       Ok(())
   }
   ```
   Then I got the plan
   ```shell
   Projection: #test.c1, #test.c2
     Filter: #test.c3 = Boolean(true) AND #test.c2 = Float64(0.000001)
       TableScan: test projection=Non
   ```
   
   I also tested use datafusion-cli:
   ```sql
   ❯ create table t as SELECT * FROM (VALUES (1,true), (2,false)) as t;
   0 rows in set. Query took 0.003 seconds.
   ❯ select * from t;
   +---------+---------+
   | column1 | column2 |
   +---------+---------+
   | 1       | true    |
   | 2       | false   |
   +---------+---------+
   2 rows in set. Query took 0.002 seconds.
   ❯ explain select * from t where column1 = 2 and column2 = true;
   +---------------+-------------------------------------------------------------------+
   | plan_type     | plan                                                              |
   +---------------+-------------------------------------------------------------------+
   | logical_plan  | Projection: #t.column1, #t.column2                                |
   |               |   Filter: #t.column1 = Int64(2) AND #t.column2                    |
   |               |     TableScan: t projection=Some([0, 1])                          |
   | physical_plan | ProjectionExec: expr=[column1@0 as column1, column2@1 as column2] |
   |               |   CoalesceBatchesExec: target_batch_size=4096                     |
   |               |     FilterExec: column1@0 = 2 AND column2@1                       |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(12)           |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]             |
   |               |                                                                   |
   +---------------+-------------------------------------------------------------------+
   2 rows in set. Query took 0.004 seconds.
   ```
   
   Two cases will result in adjacent filters in logical plan:
   - Use dataframe: `df.xxx.filter(filter1).filter(filter2)`;
   - Directly build logical plan by `LogicalPlanBuilder`: `LogicalPlanBuilder::from(xx).xxx.filter().filter()...` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jackwener commented on pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
jackwener commented on pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#issuecomment-1073013685


   ❤😃 Thanks @xudong963, @doki23 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #2026: Merge adjacent filter rule for optimizer

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #2026:
URL: https://github.com/apache/arrow-datafusion/pull/2026#discussion_r830579298



##########
File path: datafusion/src/optimizer/merge_adjacent_filter.rs
##########
@@ -0,0 +1,188 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Optimizer rule to replace `where false` on a plan with an empty relation.
+//! This saves time in planning and executing the query.
+//! Note that this rule should be applied after simplify expressions optimizer rule.
+use datafusion_expr::{Expr, Operator};
+use std::sync::Arc;
+
+use crate::error::Result;
+use crate::logical_plan::{plan::Filter, LogicalPlan};
+use crate::optimizer::optimizer::OptimizerRule;
+
+use super::utils;
+use crate::execution::context::ExecutionProps;
+
+/// Optimization rule that merge adjacent filter
+#[derive(Default)]
+pub struct MergeAdjacentFilter;
+
+impl MergeAdjacentFilter {
+    #[allow(missing_docs)]
+    pub fn new() -> Self {
+        Self {}
+    }
+}
+
+fn optimize(plan: &LogicalPlan) -> LogicalPlan {
+    match plan {
+        LogicalPlan::Filter(Filter {
+            predicate: pred,
+            input,
+        }) => {
+            let sub_plan = optimize(&**input);
+            match sub_plan {
+                LogicalPlan::Filter(Filter {
+                    predicate: sub_pred,
+                    input: sub_input,
+                }) => LogicalPlan::Filter(Filter {
+                    predicate: Expr::BinaryExpr {
+                        left: Box::new(pred.clone()),
+                        op: Operator::And,
+                        right: Box::new(sub_pred),
+                    },

Review comment:
       FWIW at least some redundant expressions are already removed by the rule in https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/simplify_expressions.rs
   
   ```sql
   ❯ create table foo as select * from (values (1, 2, 3), (4, 5, 6), (7, 8, 9)) as sq;
   ...
   -- Note the rendant `column1 = column2` filter is removed in the actual plan
   ❯ explain select * from foo where (column1 = column2) AND (column1 = column3) AND (column1 = column2);
   +---------------+-----------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                    |
   +---------------+-----------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #foo.column1, #foo.column2, #foo.column3                                    |
   |               |   Filter: #foo.column1 = #foo.column2 AND #foo.column1 = #foo.column3                   |
   |               |     TableScan: foo projection=Some([0, 1, 2])                                           |
   | physical_plan | ProjectionExec: expr=[column1@0 as column1, column2@1 as column2, column3@2 as column3] |
   |               |   CoalesceBatchesExec: target_batch_size=4096                                           |
   |               |     FilterExec: column1@0 = column2@1 AND column1@0 = column3@2                         |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(16)                                 |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]                                   |
   |               |                                                                                         |
   +---------------+-----------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.005 seconds.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org