You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/02 19:52:33 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new issue, #4091: Parquet predicates contains `and true` expressions

Dandandan opened a new issue, #4091:
URL: https://github.com/apache/arrow-datafusion/issues/4091

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   When looking at the physical plans,  I am seeing predicates pushed down to ParquetExec that are more complex than needed.
   
   E.g. for TPC-H query 12 this is a value for predicate that is pushed down (notice the `and true and true`
   ```
   l_shipmode_min@0 <= MAIL AND MAIL <= l_shipmode_max@1 OR l_shipmode_min@0 <= SHIP AND SHIP <= l_shipmode_max@1 
   AND true AND true AND l_receiptdate_max@2 >= 8766 AND l_receiptdate_min@3 < 9131
   ```
   
   
   **Describe the solution you'd like**
   Don't introduce `and true` in those expressions.
   
   **Describe alternatives you've considered**
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan closed issue #4091: Parquet predicates contains `and true` expressions

Posted by GitBox <gi...@apache.org>.
Dandandan closed issue #4091: Parquet predicates contains `and true` expressions
URL: https://github.com/apache/arrow-datafusion/issues/4091


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on issue #4091: Parquet predicates contains `and true` expressions

Posted by GitBox <gi...@apache.org>.
Jefffrey commented on issue #4091:
URL: https://github.com/apache/arrow-datafusion/issues/4091#issuecomment-1311664315

   Looks like the particular behaviour for query 12 in TPC-H is caused by the `return Ok(unhandled)` here:
   
   https://github.com/apache/arrow-datafusion/blob/509c82c6d624bb63531f67531195b562a241c854/datafusion/core/src/physical_optimizer/pruning.rs#L787-L795
   
   Where the `and true`'s are generated by the `l_commitdate < l_receiptdate` and `l_shipdate < l_commitdate` conditions.
   
   Would a potential fix be to introduce a step after `build_predicate_expression(...)` is called to fold down the resultant expression to remove those redundant conditions, after the fact?
   
   https://github.com/apache/arrow-datafusion/blob/509c82c6d624bb63531f67531195b562a241c854/datafusion/core/src/physical_optimizer/pruning.rs#L129-L139
   
   Or perhaps to refactor the `build_predicate_expression(...)` function itself to not simply return a boolean TRUE for unhandled cases (causing `and true` to be appended to expressions) and instead maybe return something more informative like an option (instead of current `Expr`), to indicate whether an expression was generated or not? To try avoid introducing the `and true` in the first place, if possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org