You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/15 20:14:45 UTC

[GitHub] [arrow] yordan-pavlov commented on a change in pull request #8917: ARROW-9828: [Rust] [DataFusion] Support filter pushdown optimisation for TableProvider implementations

yordan-pavlov commented on a change in pull request #8917:
URL: https://github.com/apache/arrow/pull/8917#discussion_r543655758



##########
File path: rust/datafusion/src/datasource/parquet.rs
##########
@@ -65,6 +66,7 @@ impl TableProvider for ParquetTable {
         &self,
         projection: &Option<Vec<usize>>,
         batch_size: usize,
+        _filters: &[Expr],

Review comment:
       @returnString it's great to see someone else working on predicate push-down as well; 
   I have been working on this for a couple of weeks, targeting an end-to-end implementation for parquet and have done similar changes to the filter push-down optimizer but your implementation is better because of the idea for full vs partial filter push-down; in my version I have `predicate: &Option<Expr>`, but `filters: &[Expr]` should work as well;
   
   I think it makes sense to separate the generic support for predicate push-down to the data source from the implementation for various data sources such as parquet because each change will be fairly big so makes sense to split into smaller changes;
   
   regarding a parquet implementation of predicate push-down I have been working on the idea of building arrays from the min / max statistics in row groups and then reusing the existing physical expressions already implemented in datafusion; I already have the code that builds statistics arrays, next working on the expression evaluation - hopefully will have enough to start a PR in the next couple of weeks;
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org