You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/20 09:00:28 UTC

[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request, #2282: Enable filter pushdown when using In_list on parquet

Ted-Jiang opened a new pull request, #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282

   # Which issue does this PR close?
   
   Closes #2281 .
   
    # Rationale for this change
   Before:
   ```
   explain select count(*) from test where o_orderkey in(2785313, 2, 3);
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                                                                                                               |
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #COUNT(UInt8(1))                                                                                                                                                                                       |
   |               |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]                                                                                                                                                                |
   |               |     Filter: #test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])                                                                                                                                             |
   |               |       TableScan: test projection=Some([0]), partial_filters=[#test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])]                                                                                           |
   | physical_plan | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))]                                                                                                                                                        |
   |               |   HashAggregateExec: mode=Final, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                                    |
   |               |     CoalescePartitionsExec                                                                                                                                                                                         |
   |               |       HashAggregateExec: mode=Partial, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                              |
   |               |         CoalesceBatchesExec: target_batch_size=4096                                                                                                                                                                |
   |               |           FilterExec: o_orderkey@0 IN ([Literal { value: Int64(2785313) }, Literal { value: Int64(2) }, Literal { value: Int64(3) }])                                                                              |
   |               |             RepartitionExec: partitioning=RoundRobinBatch(16)                                                                                                                                                      |
   |               |               ParquetExec: limit=None, partitions=[/Users/yangjiang/test-data/tpch-1g-oneFile/orders/part-00000-0e58b960-37a0-44c9-8561-53f0c32cf038-c000.snappy.parquet], predicate=true, projection=[o_orderkey] |
   |               |                                                                                                                                                                                                                    |
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.007 seconds.
   ```
   
   Now
   ```
   explain select count(*) from test where o_orderkey in(2785313, 2, 3);
   +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                            |
   +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #COUNT(UInt8(1))                                                                                                                                                                                                                                                                                                                                                    |
   |               |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]                                                                                                                                                                                                                                                                                                                             |
   |               |     Filter: #test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])                                                                                                                                                                                                                                                                                                          |
   |               |       TableScan: test projection=Some([0]), partial_filters=[#test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])]                                                                                                                                                                                                                                                        |
   | physical_plan | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))]                                                                                                                                                                                                                                                                                                                     |
   |               |   HashAggregateExec: mode=Final, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                                                                                                                                                                                                 |
   |               |     CoalescePartitionsExec                                                                                                                                                                                                                                                                                                                                                      |
   |               |       HashAggregateExec: mode=Partial, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                                                                                                                                                                                           |
   |               |         CoalesceBatchesExec: target_batch_size=4096                                                                                                                                                                                                                                                                                                                             |
   |               |           FilterExec: o_orderkey@0 IN ([Literal { value: Int64(2785313) }, Literal { value: Int64(2) }, Literal { value: Int64(3) }])                                                                                                                                                                                                                                           |
   |               |             RepartitionExec: partitioning=RoundRobinBatch(16)                                                                                                                                                                                                                                                                                                                   |
   |               |               ParquetExec: limit=None, partitions=[/Users/yangjiang/test-data/tpch-1g-oneFile/orders/part-00000-0e58b960-37a0-44c9-8561-53f0c32cf038-c000.snappy.parquet], predicate=o_orderkey_min@0 <= 2785313 AND 2785313 <= o_orderkey_max@1 OR o_orderkey_min@0 <= 2 AND 2 <= o_orderkey_max@1 OR o_orderkey_min@0 <= 3 AND 3 <= o_orderkey_max@1, projection=[o_orderkey] |
   |               |                                                                                                                                                                                                                                                                                                                                                                                 |
   +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.012 seconds.
   ```
   when optimize in `pruning.rs`, try change `in_list` to the combination of `or` for reusing code.
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r855886345


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {

Review Comment:
   Sounds reasonable!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r856263904


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,20 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } if !list.is_empty() && list.len() < 20 => {
+            let eq_fun = if *negated { Expr::not_eq } else { Expr::eq };
+            let re_fun = if *negated { Expr::and } else { Expr::or };
+            let change_expr = list
+                .iter()
+                .map(|e| eq_fun(*expr.clone(), e.clone()))
+                .reduce(re_fun)

Review Comment:
   MAP / REDUCE[1] -- love it!
   
   [1] https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r855486216


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {

Review Comment:
   I also recommend putting a limit on the size of the INLIST when the rewrite occurs otherwise we'll likely explode in evaluation  -- for example don't allow `IN (1,2,.......1000000000)`  -- maybe a limit if 10 or 20 would be good?



##########
datafusion/core/tests/parquet_pruning.rs:
##########
@@ -403,6 +403,21 @@ async fn prune_f64_complex_expr_subtract() {
     assert_eq!(output.result_rows, 9, "{}", output.description());
 }
 
+#[tokio::test]
+async fn prune_int32_eq_in_list() {
+    // resulrt of sql "SELECT * FROM t where in (1)"

Review Comment:
   ```suggestion
       // result of sql "SELECT * FROM t where in (1)"
   ```



##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -1340,6 +1363,26 @@ mod tests {
         Ok(())
     }
 
+    #[test]
+    fn row_group_predicate_in_list() -> Result<()> {
+        let schema = Schema::new(vec![
+            Field::new("c1", DataType::Int32, false),
+            Field::new("c2", DataType::Int32, false),
+        ]);
+        // test c1 in(1, 2, 3)
+        let expr = Expr::InList {
+            expr: Box::new(col("c1")),
+            list: vec![lit(1), lit(2), lit(3)],
+            negated: false,
+        };
+        let expected_expr = "#c1_min <= Int32(1) AND Int32(1) <= #c1_max OR #c1_min <= Int32(2) AND Int32(2) <= #c1_max OR #c1_min <= Int32(3) AND Int32(3) <= #c1_max";
+        let predicate_expr =
+            build_predicate_expression(&expr, &schema, &mut RequiredStatColumns::new())?;
+        assert_eq!(format!("{:?}", predicate_expr), expected_expr);
+

Review Comment:
   I recommend some other test cases:
   
   1. `c1 IN ()` as there is code checking this case
   2. `c1 IN (1)`
   3. `c1 NOT IN (1,2,3)`



##########
datafusion/core/tests/parquet_pruning.rs:
##########
@@ -403,6 +403,21 @@ async fn prune_f64_complex_expr_subtract() {
     assert_eq!(output.result_rows, 9, "{}", output.description());
 }
 
+#[tokio::test]
+async fn prune_int32_eq_in_list() {
+    // resulrt of sql "SELECT * FROM t where in (1)"
+    let output = ContextWithParquet::new(Scenario::Int32)
+        .await
+        .query("SELECT * FROM t where i in (1)")
+        .await;
+
+    println!("{}", output.description());
+    // This should prune out groups without error
+    assert_eq!(output.predicate_evaluation_errors(), Some(0));
+    assert_eq!(output.row_groups_pruned(), Some(3));
+    assert_eq!(output.result_rows, 1, "{}", output.description());
+}

Review Comment:
   maybe it is worth a negative test too -- like `SELECT * from t where i in (1000)` and expect nothing is pruned



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r854886715


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {

Review Comment:
   πŸ˜‚  i don't understand this one, how should i achieve this? maybe i lost something in learning rustπŸ˜”



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r854969159


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {
+                let mut or_expr = if *negated {
+                    Expr::not_eq(*expr.clone(), list[0].clone())
+                } else {
+                    Expr::eq(*expr.clone(), list[0].clone())
+                };
+
+                for e in list.iter().skip(1) {

Review Comment:
   Ah sorry, the map part will be more something like:
   
   `map(|e| eq_fun(*expr.clone(), e))`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r856265134


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -1340,6 +1363,26 @@ mod tests {
         Ok(())
     }
 
+    #[test]
+    fn row_group_predicate_in_list() -> Result<()> {
+        let schema = Schema::new(vec![
+            Field::new("c1", DataType::Int32, false),
+            Field::new("c2", DataType::Int32, false),
+        ]);
+        // test c1 in(1, 2, 3)
+        let expr = Expr::InList {
+            expr: Box::new(col("c1")),
+            list: vec![lit(1), lit(2), lit(3)],
+            negated: false,
+        };
+        let expected_expr = "#c1_min <= Int32(1) AND Int32(1) <= #c1_max OR #c1_min <= Int32(2) AND Int32(2) <= #c1_max OR #c1_min <= Int32(3) AND Int32(3) <= #c1_max";
+        let predicate_expr =
+            build_predicate_expression(&expr, &schema, &mut RequiredStatColumns::new())?;
+        assert_eq!(format!("{:?}", predicate_expr), expected_expr);
+

Review Comment:
   Testing for the win!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r855885453


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {

Review Comment:
   Thanks @Dandandan ❀️



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r855934549


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -1340,6 +1363,26 @@ mod tests {
         Ok(())
     }
 
+    #[test]
+    fn row_group_predicate_in_list() -> Result<()> {
+        let schema = Schema::new(vec![
+            Field::new("c1", DataType::Int32, false),
+            Field::new("c2", DataType::Int32, false),
+        ]);
+        // test c1 in(1, 2, 3)
+        let expr = Expr::InList {
+            expr: Box::new(col("c1")),
+            list: vec![lit(1), lit(2), lit(3)],
+            negated: false,
+        };
+        let expected_expr = "#c1_min <= Int32(1) AND Int32(1) <= #c1_max OR #c1_min <= Int32(2) AND Int32(2) <= #c1_max OR #c1_min <= Int32(3) AND Int32(3) <= #c1_max";
+        let predicate_expr =
+            build_predicate_expression(&expr, &schema, &mut RequiredStatColumns::new())?;
+        assert_eq!(format!("{:?}", predicate_expr), expected_expr);
+

Review Comment:
   Thanks, detected  when `not in` should change to `c !=1 and c!=2` using `and` op bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r855949661


##########
datafusion/core/tests/parquet_pruning.rs:
##########
@@ -403,6 +403,21 @@ async fn prune_f64_complex_expr_subtract() {
     assert_eq!(output.result_rows, 9, "{}", output.description());
 }
 
+#[tokio::test]
+async fn prune_int32_eq_in_list() {
+    // resulrt of sql "SELECT * FROM t where in (1)"
+    let output = ContextWithParquet::new(Scenario::Int32)
+        .await
+        .query("SELECT * FROM t where i in (1)")
+        .await;
+
+    println!("{}", output.description());
+    // This should prune out groups without error
+    assert_eq!(output.predicate_evaluation_errors(), Some(0));
+    assert_eq!(output.row_groups_pruned(), Some(3));
+    assert_eq!(output.result_rows, 1, "{}", output.description());
+}

Review Comment:
   πŸ‘



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r854969159


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {
+                let mut or_expr = if *negated {
+                    Expr::not_eq(*expr.clone(), list[0].clone())
+                } else {
+                    Expr::eq(*expr.clone(), list[0].clone())
+                };
+
+                for e in list.iter().skip(1) {

Review Comment:
   Ah sorry, it will be more something like:
   
   `map(|e| eq_fun(*expr.clone, e))`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r853931262


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {

Review Comment:
   This check could be added to the InList match variant instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r853928376


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {
+                let mut or_expr = if *negated {
+                    Expr::not_eq(*expr.clone(), list[0].clone())
+                } else {
+                    Expr::eq(*expr.clone(), list[0].clone())
+                };
+
+                for e in list.iter().skip(1) {

Review Comment:
   We could simplify the above code to something like (untested):
   
   ```rust
   
   let eq_fun = if *negated { Expr::not_eq } else { Expr::Eq };
   
   let or_expr = list.into_iter().
   map(eq_fun)
   .reduce(Expr::or);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r853928376


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {
+                let mut or_expr = if *negated {
+                    Expr::not_eq(*expr.clone(), list[0].clone())
+                } else {
+                    Expr::eq(*expr.clone(), list[0].clone())
+                };
+
+                for e in list.iter().skip(1) {

Review Comment:
   We could simplify the above code to something like (untested):
   
   ```rust
   
   let eq_fun = if *negated { Expr::not_eq } else { Expr::eq };
   
   let or_expr = list.into_iter()
   .map(eq_fun)
   .reduce(Expr::or);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r853928376


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {
+                let mut or_expr = if *negated {
+                    Expr::not_eq(*expr.clone(), list[0].clone())
+                } else {
+                    Expr::eq(*expr.clone(), list[0].clone())
+                };
+
+                for e in list.iter().skip(1) {

Review Comment:
   We could simplify the above code to something like (untested):
   
   ```rust
   
   let eq_fun = if *negated { Expr::not_eq } else { Expr::Eq };
   
   let or_expr = list.into_iter()
   .map(eq_fun)
   .reduce(Expr::or);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r854884822


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {
+                let mut or_expr = if *negated {
+                    Expr::not_eq(*expr.clone(), list[0].clone())
+                } else {
+                    Expr::eq(*expr.clone(), list[0].clone())
+                };
+
+                for e in list.iter().skip(1) {

Review Comment:
   Sorry,  use this get
   ```
   error[E0593]: function is expected to take 1 argument, but it takes 2 arguments
      --> datafusion/core/src/physical_optimizer/pruning.rs:716:26
       |
   716 |                     .map(eq_fun)
       |                      --- ^^^^^^ expected function that takes 1 argument
       |                      |
       |                      required by a bound introduced by this call
   
   ```
   I try to use `fold`
   but acc must be wrapped by box.
   
   Is there a fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2282: Enable filter pushdown when using In_list on parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on code in PR #2282:
URL: https://github.com/apache/arrow-datafusion/pull/2282#discussion_r854966668


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -688,6 +688,29 @@ fn build_predicate_expression(
                 return Ok(unhandled);
             }
         }
+        Expr::InList {
+            expr,
+            list,
+            negated,
+        } => {
+            if !list.is_empty() {

Review Comment:
   You can add the condition after the pattern
   
   `(inlist pattern) if !list.is_empty() => {...}`
   
   
   
   See the docs here: https://doc.rust-lang.org/rust-by-example/flow_control/match/guard.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org