You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/11 12:52:34 UTC

[GitHub] [arrow-datafusion] Jefffrey commented on issue #4091: Parquet predicates contains `and true` expressions

Jefffrey commented on issue #4091:
URL: https://github.com/apache/arrow-datafusion/issues/4091#issuecomment-1311664315

   Looks like the particular behaviour for query 12 in TPC-H is caused by the `return Ok(unhandled)` here:
   
   https://github.com/apache/arrow-datafusion/blob/509c82c6d624bb63531f67531195b562a241c854/datafusion/core/src/physical_optimizer/pruning.rs#L787-L795
   
   Where the `and true`'s are generated by the `l_commitdate < l_receiptdate` and `l_shipdate < l_commitdate` conditions.
   
   Would a potential fix be to introduce a step after `build_predicate_expression(...)` is called to fold down the resultant expression to remove those redundant conditions, after the fact?
   
   https://github.com/apache/arrow-datafusion/blob/509c82c6d624bb63531f67531195b562a241c854/datafusion/core/src/physical_optimizer/pruning.rs#L129-L139
   
   Or perhaps to refactor the `build_predicate_expression(...)` function itself to not simply return a boolean TRUE for unhandled cases (causing `and true` to be appended to expressions) and instead maybe return something more informative like an option (instead of current `Expr`), to indicate whether an expression was generated or not? To try avoid introducing the `and true` in the first place, if possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org