You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/06 13:50:28 UTC

[GitHub] [arrow] bkietz edited a comment on pull request #7907: ARROW-9644: [C++][Dataset] Don't apply ignore_prefixes to partition base_dir

bkietz edited a comment on pull request #7907:
URL: https://github.com/apache/arrow/pull/7907#issuecomment-669936148


   > And the "partition base directory" is automatically set if a user does something like ds.dataset("_shouldnt_be_ignored/dataset/") ?
   
   Currently in python the partition_base_dir is always identical to the base directory of the recursive selector used, so yes.
   
   In principle it would be possible to select `/mnt/data/**` and set `partition_base_dir=/mnt/data/partitioned`, in which case `/mnt/data/other/**` would be included in the dataset but would not have partition information attached. Alternatively, one could set `partition_base_dir=/mnt` and select `/mnt/year=2020/**`, in which case all fragments would include the `"year"_ == 2020` partition expression. I question the utility of these cases, honestly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org