You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/06 11:46:20 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request #689: Remove qualifiers on pushed down predicates / Fix parquet pruning

alamb opened a new pull request #689:
URL: https://github.com/apache/arrow-datafusion/pull/689


   # Which issue does this PR close?
   
   Closes #https://github.com/apache/arrow-datafusion/issues/656
   
   Based on https://github.com/apache/arrow-datafusion/pull/657 (test for parquet pruning) so review that first
   
    # Rationale for this change
   Predicate pushdown for parquet is currently broken because the filtering logic gets columns that refer to `foo.bar` but the table provider only knows about columns (e.g. only about `bar)`
   
   # What changes are included in this PR?
   1. `unnormalize_expr` function to remove qualifiers from expressions
   2.  tests
   
   
   # Are there any user-facing changes?
   Pruning works again
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #689: Remove qualifiers on pushed down predicates / Fix parquet pruning

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #689:
URL: https://github.com/apache/arrow-datafusion/pull/689


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #689: Remove qualifiers on pushed down predicates / Fix parquet pruning

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #689:
URL: https://github.com/apache/arrow-datafusion/pull/689#discussion_r664828470



##########
File path: datafusion/src/logical_plan/expr.rs
##########
@@ -1150,6 +1150,38 @@ pub fn normalize_cols(
         .collect()
 }
 
+/// Recursively 'unnormalize' (remove all qualifiers) from an
+/// expression tree.
+///
+/// For example, if there were expressions like `foo.bar` this would
+/// rewrite it to just `bar`.
+pub fn unnormalize_col(expr: Expr) -> Expr {
+    struct RemoveQualifier {}
+
+    impl ExprRewriter for RemoveQualifier {
+        fn mutate(&mut self, expr: Expr) -> Result<Expr> {
+            if let Expr::Column(col) = expr {
+                //let Column { relation: _, name } = col;

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #689: Remove qualifiers on pushed down predicates / Fix parquet pruning

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #689:
URL: https://github.com/apache/arrow-datafusion/pull/689#issuecomment-874917927


   FYI @houqp  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #689: Remove qualifiers on pushed down predicates / Fix parquet pruning

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #689:
URL: https://github.com/apache/arrow-datafusion/pull/689#discussion_r664718765



##########
File path: datafusion/src/physical_plan/planner.rs
##########
@@ -312,7 +313,13 @@ impl DefaultPhysicalPlanner {
                 filters,
                 limit,
                 ..
-            } => source.scan(projection, batch_size, filters, *limit),
+            } => {
+                // Remove all qualifiers from the scan as the provider
+                // doesn't know (nor should care) how the relation was
+                // referred to in the query
+                let filters = unnormalize_cols(filters.iter().cloned());

Review comment:
       Here is the actual fix for parquet pruning. 

##########
File path: datafusion/src/logical_plan/expr.rs
##########
@@ -1120,7 +1120,7 @@ pub fn columnize_expr(e: Expr, input_schema: &DFSchema) -> Expr {
 
 /// Recursively call [`Column::normalize`] on all Column expressions
 /// in the `expr` expression tree.
-pub fn normalize_col(e: Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {
+pub fn normalize_col(expr: Expr, schemas: &[&DFSchemaRef]) -> Result<Expr> {

Review comment:
       Changed to use the name `expr` to be more consistent with the rest of the codebase

##########
File path: datafusion/src/logical_plan/expr.rs
##########
@@ -1762,4 +1794,76 @@ mod tests {
             }
         }
     }
+
+    #[test]
+    fn normalize_cols() {

Review comment:
       I added some additional unit test coverage here for normalize when I was writing the `unnormalize` versions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org