You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/17 20:26:14 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

andygrove opened a new issue, #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I have a query that looks like this after optimization:
   
   ```
   Union 
     Projection
       ...
     Projection
       Filter: Boolean(false)
         Aggregate
           Inner Join
             Inner Join
   ```        
   
   **Describe the solution you'd like**
   Remove everything under the `Filter` that is always false and replace with `EmptyRelation` so that we avoid executing the join and aggregate.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864#issuecomment-1281770016

   @andygrove Looks like there is already a rule `EliminateFilter` for this purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
alamb closed issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`
URL: https://github.com/apache/arrow-datafusion/issues/3864


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864#issuecomment-1313909874

   FWIW, This is no longer an issue for me in Dask SQL. I am not sure what fixed it but maybe the multiple optimizer passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864#issuecomment-1281750736

   We can have two general rules: `FoldFilters` and `EmptyRelationPropagation`.
   `FoldFilters` can remove the filters which can be evaluated trivially. If `Filter` condition is alway false,  replace it with an `EmptyRelation`
   
   `EmptyRelationPropagation` can propagate the empty relation up to the tree, for example, inner join with `EmptyRelation` can return `EmptyRelation`.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864#issuecomment-1287760116

   @andygrove  can you share a reproducer? We can then add it to the regression tests to make sure the empty filter is indeed eliminated and doesn't come back in


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864#issuecomment-1282194781

   Perhaps the changes in https://github.com/apache/arrow-datafusion/pull/3841 from @Dandandan  made it possible to simplify the filter
   
   Perhaps we could solve this issue by adding another  run of `EliminateFilter`  at the end of the list in datafusion/optimizer/src/optimizer.rs
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864#issuecomment-1312517544

   So I believe that it's a future ticket, it should fix after we have bottom-up optimization. like #3972


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #3864: Replace `Filter: Boolean(false)` with `EmptyRelation`

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #3864:
URL: https://github.com/apache/arrow-datafusion/issues/3864#issuecomment-1312516923

   Datafusion only can propagate `EmptyRelation` just `optimizer_config.max_passes` times. Because we depend on multi-passes optimization to multiply elimination.
   
   In other words, we currently don't  have really `EmptyRelationPropagation`.
   
   We can reproduce it. 
   
    After 3 optimizations, the plan can continue to be eliminated.
   
   ```sql
    CREATE TABLE IF NOT EXISTS t1 AS VALUES(1,'HELLO'),(12,'DATAFUSION');
    CREATE TABLE IF NOT EXISTS t2 AS VALUES(1,'HELLO'),(12,'DATAFUSION');
    CREATE TABLE IF NOT EXISTS t3 AS VALUES(1,'HELLO'),(12,'DATAFUSION');
   
   explain verbose  select column1 from t1 join ( select column1 from t2 join (select column1 from t3 where false ) as ta2 on t2.column1 = ta2.column1 ) as ta1 on t1.column1 = ta1.column1;
   ```
   
   The `root` cause is currently optimization is incomplete. We just have top-down optimize, we don't have bottom-up optimize.
   
   If we have bottom-up optimize, we can do `EmptyRelationPropagation` in once optimization.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org