You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/02 21:12:19 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #656: Predicate pruning is broken for parquet

alamb opened a new issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656


   **Describe the bug**
   Predicate pruning no longer occurs for queries against parquet files
   
   **To Reproduce**
   Run a query against a parquet file with multiple row groups with a predicate that could be used to prune. No pruning occurs
   
   **Expected behavior**
   The predicate should be able to eliminate some row groups
   
   **Additional context**
   While updating IOx to use the latest datafusion in  https://github.com/influxdata/influxdb_iox/pull/1799 I discovered another place where https://github.com/apache/arrow-datafusion/pull/55 has caused some issues
   
   Basically, the predicates that get pushed down to the parquet exec scan now are fully qualified, for example `#foo.bar > 5` however, the parquet schema only has columns named `bar` and thus the code can not match them up
   
   The reason this was not caught in #55 is that there is no end-to-end test of parquet that exercises the entire path.
   
   The fix for this issue is fairly straightforward (it is to strip the qualifiers from the expressions) but the end-to-end test is quite involved. I plan to fix this in two PRs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #656: Predicate pruning is broken for parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-873381586


   Thanks for the offer @houqp  -- now that I have the test setup, this will be straightforward to fix :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #656: Predicate pruning is broken for parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-875786676


   Closed in https://github.com/apache/arrow-datafusion/pull/689


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #656: Predicate pruning is broken for parquet

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-873352084


   Good catch :+1: If you are busy with something else, i am happy to take up on this fix as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #656: Predicate pruning is broken for parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-873258974


   FYI @houqp  and @yordan-pavlov 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #656: Predicate pruning is broken for parquet

Posted by GitBox <gi...@apache.org>.
alamb closed issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org