You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/12 19:36:14 UTC

[GitHub] [arrow-datafusion] houqp commented on issue #133: Add support for reading partitioned Parquet files

houqp commented on issue #133:
URL: https://github.com/apache/arrow-datafusion/issues/133#issuecomment-840044928


   Hive partitioning is the most commonly used scheme, but there are other schemes as well, for example, the python arrow package supports both directory partitioning and hive partitioning: https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html?highlight=partition.
   
   I agree with @Dandandan that we should add the concept of partition column first, then tackle how we ser/de partition values from file paths. I can see us going the python arrow route as well, i.e. supporting multiple partitioning schemes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org