You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/05/01 13:38:31 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #6179: Implement expression simplification for filtering (vs generically)

alamb opened a new issue, #6179:
URL: https://github.com/apache/arrow-datafusion/issues/6179

   ### Is your feature request related to a problem or challenge?
   
   When simplifying expressions in general, care must be taken with nulls. For example it seems like this expression can be simplified to `false`:
   
   ```sql
   A AND !A
   ```
   
   however, if `A` is `null` then the expression actually evaluates to `null` and not `false` 
   
   The existing simplification code handles these cases by checking `is_nullable`:
   
   https://github.com/apache/arrow-datafusion/blob/71efcf5ee8900a1efe12fb812e210e6941060733/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L575-L582
   
   
   However, if we are using `A AND !A` as a filter, we can actually simplify the whole thing to `false` because for filtering:
   
   * `true` --> row is kept
   *`false` --> row is not kept
   * `null` --> row is not kept (same as false)
   
   
   ### Describe the solution you'd like
   
   I would like someone to figure out how to take advantage of the above observation to apply more simplification rules when simplifying predicates
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Implement expression simplification for filtering (vs generically) [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6179:
URL: https://github.com/apache/arrow-datafusion/issues/6179#issuecomment-1875226373

   I was thinking about this last night and it seems to me that [PruningPredicate](https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html#) does exactly this type of simplification for predicates, based on additional information that might be available (such as null counts, and statistics).
   
   It happens using physical expressions (not just logical expressions) so it is too late in the planning process to affect the logical plan 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on issue #6179: Implement expression simplification for filtering (vs generically)

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on issue #6179:
URL: https://github.com/apache/arrow-datafusion/issues/6179#issuecomment-1535224529

   Ah yes, good catch @alamb, totally miss that. I need to rethink this 🤔.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on issue #6179: Implement expression simplification for filtering (vs generically)

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on issue #6179:
URL: https://github.com/apache/arrow-datafusion/issues/6179#issuecomment-1530815784

   # Background
   
   AFAIK, currently filter expressions can occur in three types of LogicalPlan: TableScan, Filter, and Join. In Join LogicalPlan, expressions can occur in either ON clause or optional FILTER clause but the current expr simplifier rule doesn't have any context of where the expression occurs, i.e. we want to optimize `FILTER` expr in Join logical plan but not in `ON` clause. So I propose to add ExprContext that can be passed to the expr simplifier.
   
   # Proposed API Change
   
   I think this requires adding an ExprContext struct which has the following fields:
   ```rust
   pub struct ExprContext {
       in_filter: bool,
   }
   ```
   
   And also change `LogicalPlan::inspect_expressions` to take a function as an argument that has the following signature:
   
   ```rust
   F: FnMut(&Expr, ExprContext) -> Result<(), E>
   ```
   
   And add another public API to LogicalPlan with the following function signature:
   
   ```rust
   pub fn expressions_with_context(self: &LogicalPlan) -> Vec<(Expr, ExprContext)>
   ```
   
   # Alternative
   
   Since OptimizerRule takes LogicalPlan as input, perhaps we don't need to modify/add public LogicalPlan API and keeps the implementation detail encapsulated in `SimplifyExpressions`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] aprimadi commented on issue #6179: Implement expression simplification for filtering (vs generically)

Posted by "aprimadi (via GitHub)" <gi...@apache.org>.
aprimadi commented on issue #6179:
URL: https://github.com/apache/arrow-datafusion/issues/6179#issuecomment-1537199622

   CMIIW. I may miss something here.
   
   It looks like we can just propagate the type information top-down from overall expressions to subexpression. Let's just say for the purpose of this issue we only have two types: `BoolType` and `NullType`. In a filter, the simplifier walks the subtree starting from `BoolType`, only changing it's type to `NullType` if it encounters these two statements:
   
   ```
   (...) IS NULL
   (...) IS NOT NULL
   ```
   
   Once a type changed to `NullType`, it can't be changed back to `BoolType` in subexpressions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6179: Implement expression simplification for filtering (vs generically)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6179:
URL: https://github.com/apache/arrow-datafusion/issues/6179#issuecomment-1532101348

   Thanks @aprimadi 
   
   Do you think your solution would work for filters like:
   
   ```sql
   WHERE (a and !a) IS NULL
   ```
   
   
   Specifically, I think the "are we in a filter" only applies to the overall expression, I am not sure it applies to sub expressions within 🤔 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org