You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/27 19:18:10 UTC

[GitHub] [arrow-datafusion] dispanser commented on issue #133: Add support for reading partitioned Parquet files

dispanser commented on issue #133:
URL: https://github.com/apache/arrow-datafusion/issues/133#issuecomment-827854576


   Is there any reason to limit this to parquet files? In spark, this functionality is shared between csv, json, orc and parquet.
   
   Maybe the implementation could target the shared file listing in `physical_plan::common::build_file_list()` which seems to be shared between parquet and csv.
   
   Considering #204 (adding partition pruning), it may be sensibel to already implement the partition pruning logic early in the file listing procedure itself, as it could save on file listing operations, which tend to be expensive in particular on cloud storage (EBS).
   
   I'd love to work on this, but I'd need a bit of guidance on the preferred approach.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org