You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/12 00:26:16 UTC

[GitHub] [arrow] a-campbell opened a new issue #8646: Predicate pushdown question

a-campbell opened a new issue #8646:
URL: https://github.com/apache/arrow/issues/8646


   Hi Arrow community,
   
   I'm new to the project and am trying to understand exactly what is happening under the hood when I run a filter-collect query on an Arrow Dataset (backed by Parquet).
   
   Let's say I created a Parquet dataset with no file-level partitions. I just wrote a bunch of separate files to a dataset. Now I want to run a query that returns the rows corresponding to a specific range of datetimes in the dataset's dt column.
   
   My understanding is that the Dataset API will push this query down to the file level, checking the footer of each file for the min/max value of dt and determining whether this block of rows should be read.
   
   Assuming this is correct, a few questions:
   
   Will every query result in the reading all of the file footers? Is there any caching of these min/max values?
   
   Is there a way to profile query performance? A way to view a query plan before it is executed?
   
   I appreciate your time in helping me better understand.
   
   Andrew
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on issue #8646: Predicate pushdown question

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #8646:
URL: https://github.com/apache/arrow/issues/8646#issuecomment-726169236


   Would you mind asking this question on the dev@ or user@ mailing list? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed issue #8646: Predicate pushdown question

Posted by GitBox <gi...@apache.org>.
wesm closed issue #8646:
URL: https://github.com/apache/arrow/issues/8646


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org