You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "thinkharderdev (via GitHub)" <gi...@apache.org> on 2023/02/23 20:03:34 UTC

[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #5362: Add index interface method

thinkharderdev commented on PR #5362:
URL: https://github.com/apache/arrow-datafusion/pull/5362#issuecomment-1442362033

   > The current planner will test each individual part of the conjunction and the Table will have to scan() all `Gardner`s - even though it would have done a point lookup on `Brent Gardner` (assuming I'm the only one).
   
   This is a bit awkward in the current model. We have a (very rudimentary) notion of cost modeling when doing predicate pushdown in the parquet scan when determining what order to evaluate the predicates. As you mention it's not quite ideal since by that point we have already split the conjunctions, but it has one advantage in that by that point you are dealing with a single file so you have more metadata to make a cost calculation (column chunk sizes, etc). Is the idea here that we bubble information about global sort indexes up to the logical `TableProvider` so we avoid splitting those two predicates when pushing down to the scan? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org