You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/15 04:30:57 UTC

[GitHub] [arrow] jhostetler opened a new issue, #33668: Reading flat dataset with `partitioning="hive"` results in partition schema equal to dataset schema

jhostetler opened a new issue, #33668:
URL: https://github.com/apache/arrow/issues/33668

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   Example code:
   ```
   % ls flat/
   4dc78608-2739-4b67-942a-6eb39d39fe62.999.parquet
   % python
   >>> hive_ds = pyarrow.dataset.dataset("flat/", partitioning="hive", format="parquet")
   >>> hive_ds.partitioning
   <pyarrow._dataset.HivePartitioning object at 0x1012a7990>
   >>> hive_ds.partitioning.schema
   _index_: int64
   label: list<item: string>
     child 0, item: string
   score: list<item: float>
     child 0, item: float
   >>> hive_ds.schema
   _index_: int64
   label: list<item: string>
     child 0, item: string
   score: list<item: float>
     child 0, item: float
   >>>
   >>> flat_ds = pyarrow.dataset.dataset("flat/", format="parquet")
   >>> flat_ds.schema
   _index_: int64
   label: list<item: string>
     child 0, item: string
   score: list<item: float>
     child 0, item: float
   >>> flat_ds.partitioning
   >>> flat_ds.partitioning is None
   True
   ```
   
   Notice how when using `partitioning="hive"`, the partitioning schema is the same as the dataset schema. Since the data is not, in fact, partitioned, I would expect `hive_ds.partitioning` to be `None`.
   
   PyArrow 10.0.1, Linux, same result with data on local filesystem and when retrieving from GCS.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on issue #33668: [Python] Reading flat dataset with `partitioning="hive"` results in partition schema equal to dataset schema

Posted by "vibhatha (via GitHub)" <gi...@apache.org>.
vibhatha commented on issue #33668:
URL: https://github.com/apache/arrow/issues/33668#issuecomment-1406682476

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #33668: Reading flat dataset with `partitioning="hive"` results in partition schema equal to dataset schema

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #33668:
URL: https://github.com/apache/arrow/issues/33668#issuecomment-1399087997

   I can confirm.  I reproduced this with the latest and agree it is a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org