You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/20 15:55:31 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #370: Return Vec from PredicateBuilder rather than an `Fn`

alamb commented on a change in pull request #370:
URL: https://github.com/apache/arrow-datafusion/pull/370#discussion_r636236186



##########
File path: datafusion/src/physical_optimizer/pruning.rs
##########
@@ -85,24 +90,24 @@ impl PruningPredicateBuilder {
         })
     }
 
-    /// Generate a predicate function used to filter based on
-    /// statistics
+    /// For each set of statistics, evalates the predicate in this
+    /// builder and returns a `bool` with the following meaning for a
+    /// container with those statistics:
+    ///
+    /// `true`: The container MAY contain rows that match the predicate
     ///
-    /// This function takes a slice of statistics as parameter, so
-    /// that DataFusion's physical expressions can be executed once
-    /// against a single RecordBatch, containing statistics arrays, on
-    /// which the physical predicate expression is executed to
-    /// generate a row group filter array.
+    /// `false`: The container definitely does NOT contain rows that match the predicate
     ///
-    /// The generated filter function is then used in the returned
-    /// closure to filter row groups. NOTE this is parquet specific at the moment
+    /// Note this function takes a slice of statistics as a parameter
+    /// to amortize the cost of the evaluation of the predicate
+    /// against a single record batch.
     pub fn build_pruning_predicate(
         &self,
-        row_group_metadata: &[RowGroupMetaData],
-    ) -> Box<dyn Fn(&RowGroupMetaData, usize) -> bool> {
+        statistics: &[RowGroupMetaData],

Review comment:
       The main change in this PR is this function's signature. The rest of the changes are fallout from doing that

##########
File path: datafusion/src/physical_plan/parquet.rs
##########
@@ -454,6 +457,22 @@ fn send_result(
     Ok(())
 }
 
+fn build_row_group_predicate(
+    predicate_builder: &PruningPredicateBuilder,
+    row_group_metadata: &[RowGroupMetaData],
+) -> Box<dyn Fn(&RowGroupMetaData, usize) -> bool> {
+    let predicate_values = predicate_builder.build_pruning_predicate(row_group_metadata);
+
+    let predicate_values = match predicate_values {
+        Ok(values) => values,
+        // stats filter array could not be built
+        // return a closure which will not filter out any row groups
+        _ => return Box::new(|_r, _i| true),

Review comment:
       The existing code also (silently) ignores error so we continue that tradition in this PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org