You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/03/16 11:55:09 UTC

[GitHub] [arrow-datafusion] tustvold commented on issue #5466: Rework `ParquetExec::metadata_size_hint`

tustvold commented on issue #5466:
URL: https://github.com/apache/arrow-datafusion/issues/5466#issuecomment-1471818016

   > get metadata_size_hint value in infer_schema and infer_stats functions of ParquetFormat.
   
   I think it is worth drawing a distinction between the logic for catalog-inference, i.e. ListingTable, from that of query processing, i.e. FileScanConfig. Most practical applications will need a `TableProvider` backed by some sort of catalog for reasonable performance, and this would be an ideal place to store information such as the footer size, schema, statistics, etc... and this can be used to populate `FileScanConfig` accurately. 
   
   For `TableProvider` that don't have access to this information, such as `ListingTable`, I think it is perfectly acceptable to use a single config value for the metadata size hint


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org