You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/31 16:37:22 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue #1716: Eliminate Unsatisfiable Boolean Expressions

tustvold opened a new issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   I noticed Datafusion is not able to simplify the expression
   
   ```
   A ^ !A
   ```
   
   Whilst it is unlikely for a user to write such an expression, it is not uncommon for boolean expressions to reduce to something unsatisfiable. Where this is the case, substituting in `false` may allow for further simplification.
   
   **Describe the solution you'd like**
   
   It would be awesome if Datafusion was able to detect and eliminate unsatisfiable boolean expressions. I believe this is an NP-hard problem in the general case, but should be tractable for a low number of variables.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026249562


   Aah yes, the SQL interpretation of NULL as "unknown" vs perhaps more intuitive notions of it... :sweat_smile:
   
   `null == null --> null` will never cease to make no sense to me...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026004441


   On a related note it is unable to simplify expressions of the form
   
   ```
   A ^ (!A v B)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026033069


   
   Related to this, I think the PoC
   Tokomak  optimizer I created is especially suited for these kind of optimizations. https://github.com/Dandandan/datafusion-tokomak
   
   @pjmore extended this optimizer to be able to optimize both expressions and logical plan nodes.
   See here for more.
   https://github.com/apache/arrow-datafusion/pull/1066


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026214157


   Does something similar also hold for things like `A AND false`? Just wondering if the existing logic handles this correctly... 🤔


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tustvold commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026108337


   I'm not at all familiar with e-graphs, but I would perhaps naively expect something specialised to boolean algebra to significantly out-perform a method based on arbitrary expression rewriting? I guess I'm thinking something like BDDs, or even just recursively applying shannon expansion and relying on const propagation to fix the mess... 
   
   This isn't an area I'm hugely familiar with, but I can't help feeling it should just be the case of picking an off-the-shelf boolean expression minimizer and plugging it in :sweat_smile: 
   
   FWIW a quick google search found this - https://docs.rs/boolean_expression/latest/boolean_expression/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026205295


   FWIW `(A ^ !A)` is NOT always `false` because of .... 🥁  NULL
   
   ```sql
   alamb=# select null AND (not null);
    ?column? 
   ----------
    
   (1 row)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026033069


   
   Related to this, I think the PoC
   Tokomak  optimizer I created is especially suited for these kind of optimizations. https://github.com/Dandandan/datafusion-tokomak
   
   @pjmore extended this optimizer to be able to optimize both expressions and logical plan nodes.
   See here for more:
   https://github.com/apache/arrow-datafusion/pull/1485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026199135


   I do wonder if https://docs.rs/boolean_expression/latest/boolean_expression/index.html handles `Nulls` 🤔  which is an important niche case for database / sql


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026206871


   We can do this rewrite
   
   ```
   A ^ !A = false (iff A is not nullable)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026219583


   > Does something similar also hold for things like A AND false? Just wondering if the existing logic handles this correctly... 🤔
   
   Yes. In fact @NGA-TRAN fixed a bug related to that in https://github.com/apache/arrow-datafusion/pull/1245 -- I think we have it sufficiently tested now. 
   
   See some of the rules here
   https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/simplify_expressions.rs#L568
   
   It actually turn out that `A AND false` is always false (even if `A` is NULL) 
   
   However, `A OR false` could be `NULL` or `false`
   
   🤯
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #1716: Eliminate Unsatisfiable Boolean Expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1716:
URL: https://github.com/apache/arrow-datafusion/issues/1716#issuecomment-1026206871


   So FWIW we can do this rewrite
   
   ```
   A ^ !A = false (iff A is not nullable)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org