You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/06/08 21:46:33 UTC

[GitHub] [arrow] westonpace commented on issue #35841: [C++] get_fragments filter argument not filtering on statistics

westonpace commented on issue #35841:
URL: https://github.com/apache/arrow/issues/35841#issuecomment-1583432371

   I've labeled this C++ since the eventual fix will probably need to be there.  You are correct that row group filtering is not currently happening in `get_fragments`.  It may not be the simplest thing to fix.  I suspect that comment is from the legacy parquet dataset which may have operated in this fashion.
   
   Unfortunately, we do not load the parquet metadata for every single fragment when a dataset is created.  In fact, if you specify a list of files and a schema at dataset creation, we won't load any data at all from disk.  So we don't have the statistics at this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org