You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/18 21:33:58 UTC

[GitHub] [arrow-datafusion] mateuszkj opened a new issue, #2270: Partial filers are not pushdown druing optimalization for table with alias

mateuszkj opened a new issue, #2270:
URL: https://github.com/apache/arrow-datafusion/issues/2270

   **Describe the bug**
   Filters are not push down through `SubqueryAlias` to `TableScan` during logical plan optimization. This can cause unnecessary IO during pruning parquet files.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   Prepare data and run datafusion-cli with logs:
   ```sh
   echo "1,2" > data.csv
   export RUST_LOG=info,datafusion=debug
   datafusion-cli
   ```
   
   Run query without alias (`partial_filters` is added for `TableScan`):
   ```sql
   ❯ SELECT b FROM foo WHERE a=1;
   [2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Input logical plan:
       Projection: #foo.b
         Filter: #foo.a = Int64(1)
           TableScan: foo projection=None
       
   [2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Optimized logical plan:
       Projection: #foo.b
         Filter: #foo.a = Int64(1)
           TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]
   ```
   
   Run query with alias (`partial_filters` is not added for `TableScan`)
   ```sql
   ❯ SELECT a.b FROM foo a WHERE a.a = 1;
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=None
       
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=Some([0, 1])
   ```
   
   
   **Expected behavior**
   `partial_filers` should be push down to `TableScan`
   
   ```sql
   ❯ SELECT a.b FROM foo a WHERE a.a = 1;
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=None
       
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]
   ```
   
   **Additional context**
   
   Tested with master branch 5f0b61b0db9849336e2e83b23c8a45508a85fb38. I think this `SubqueryAlias` condition is not handled in file: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/optimizer/filter_push_down.rs#L299=
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Jefffrey commented on issue #2270: Filters/limit are not pushdown druing optimalization for table with alias

Posted by "Jefffrey (via GitHub)" <gi...@apache.org>.
Jefffrey commented on issue #2270:
URL: https://github.com/apache/arrow-datafusion/issues/2270#issuecomment-1426927753

   @alamb this can be closed as complete, limit done by https://github.com/apache/arrow-datafusion/pull/4425
   
   filter has a test confirming behaviour works:
   
   https://github.com/apache/arrow-datafusion/blob/f75d25fec2c1a5581eeb8ce73a890e5792df02c7/datafusion/optimizer/src/push_down_filter.rs#L2250-L2281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #2270: Filters/limit are not pushdown druing optimalization for table with alias

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2270:
URL: https://github.com/apache/arrow-datafusion/issues/2270#issuecomment-1102536522

   This bug will be fixed after finish #2213 #2212  because finish those issue must fix this bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #2270: Filters/limit are not pushdown druing optimalization for table with alias

Posted by GitBox <gi...@apache.org>.
jackwener commented on issue #2270:
URL: https://github.com/apache/arrow-datafusion/issues/2270#issuecomment-1102533935

   Yes, `push_down`  just handle `subqueryAlias -> tableScan`.
   
   I fix the limit, but I fix projection failed because I can't handle the limitation of `schema`..... It's in #2244 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener closed issue #2270: Filters/limit are not pushdown druing optimalization for table with alias

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener closed issue #2270: Filters/limit are not pushdown druing optimalization for table with alias
URL: https://github.com/apache/arrow-datafusion/issues/2270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2270: Filters/limit are not pushdown druing optimalization for table with alias

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #2270:
URL: https://github.com/apache/arrow-datafusion/issues/2270#issuecomment-1427001932

   Thanks @Jefffrey 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org