You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/02 21:12:19 UTC
[GitHub] [arrow-datafusion] alamb opened a new issue #656: Predicate pruning is broken for parquet
alamb opened a new issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656
**Describe the bug**
Predicate pruning no longer occurs for queries against parquet files
**To Reproduce**
Run a query against a parquet file with multiple row groups with a predicate that could be used to prune. No pruning occurs
**Expected behavior**
The predicate should be able to eliminate some row groups
**Additional context**
While updating IOx to use the latest datafusion in https://github.com/influxdata/influxdb_iox/pull/1799 I discovered another place where https://github.com/apache/arrow-datafusion/pull/55 has caused some issues
Basically, the predicates that get pushed down to the parquet exec scan now are fully qualified, for example `#foo.bar > 5` however, the parquet schema only has columns named `bar` and thus the code can not match them up
The reason this was not caught in #55 is that there is no end-to-end test of parquet that exercises the entire path.
The fix for this issue is fairly straightforward (it is to strip the qualifiers from the expressions) but the end-to-end test is quite involved. I plan to fix this in two PRs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #656: Predicate pruning is broken for parquet
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-873381586
Thanks for the offer @houqp -- now that I have the test setup, this will be straightforward to fix :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #656: Predicate pruning is broken for parquet
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-875786676
Closed in https://github.com/apache/arrow-datafusion/pull/689
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] houqp commented on issue #656: Predicate pruning is broken for parquet
Posted by GitBox <gi...@apache.org>.
houqp commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-873352084
Good catch :+1: If you are busy with something else, i am happy to take up on this fix as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #656: Predicate pruning is broken for parquet
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656#issuecomment-873258974
FYI @houqp and @yordan-pavlov
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #656: Predicate pruning is broken for parquet
Posted by GitBox <gi...@apache.org>.
alamb closed issue #656:
URL: https://github.com/apache/arrow-datafusion/issues/656
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org