You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "avantgardnerio (via GitHub)" <gi...@apache.org> on 2023/02/24 21:01:01 UTC

[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #5386: refactor: parquet pruning simplifications

avantgardnerio commented on code in PR #5386:
URL: https://github.com/apache/arrow-datafusion/pull/5386#discussion_r1117622609


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -721,23 +730,20 @@ fn build_predicate_expression(
     let (left, op, right) = match expr {
         Expr::BinaryExpr(BinaryExpr { left, op, right }) => (left, *op, right),
         Expr::IsNull(expr) => {
-            let expr = build_is_null_column_expr(expr, schema, required_columns)
+            return build_is_null_column_expr(expr, schema, required_columns)
                 .unwrap_or(unhandled);
-            return Ok(expr);

Review Comment:
   Oh, so this was _always_ returning Ok?



##########
datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs:
##########
@@ -110,14 +110,16 @@ impl PagePruningPredicate {
     pub fn try_new(expr: &Expr, schema: SchemaRef) -> Result<Self> {
         let predicates = split_conjunction(expr)
             .into_iter()
-            .filter_map(|predicate| match predicate.to_columns() {
-                Ok(columns) if columns.len() == 1 => {
-                    match PruningPredicate::try_new(predicate.clone(), schema.clone()) {
-                        Ok(p) if !p.allways_true() => Some(Ok(p)),
-                        _ => None,
+            .filter_map(|predicate| {
+                match PruningPredicate::try_new(predicate.clone(), schema.clone()) {
+                    Ok(p)
+                        if (!p.allways_true())
+                            && (p.required_columns().n_columns() < 2) =>

Review Comment:
   This is a behavior change for `n_columns() == 0`. Based on:
   
   ```
       pub fn allways_true(&self) -> bool {
           self.predicate_expr
               .as_any()
               .downcast_ref::<Literal>()
               .map(|l| matches!(l.value(), ScalarValue::Boolean(Some(true))))
               .unwrap_or_default()
       }
   ```
   
   I ran the test suite, panicing if `n_columns() == 0` and I can't get it to happen, so I guess it LGTM.
   I assume that would default to false, in which case I think we'd want to return a `None` here?



##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -258,6 +259,14 @@ impl RequiredStatColumns {
         Self::default()
     }
 
+    /// Returns number of unique columns.
+    pub(crate) fn n_columns(&self) -> usize {
+        self.iter()
+            .map(|(c, _s, _f)| c)

Review Comment:
   More descriptive variable names would help readability here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org