You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/30 18:47:21 UTC

[GitHub] [arrow] westonpace commented on pull request #11014: ARROW-13775: [C++] Allow Partitioning objects to be created with a vector of field names

westonpace commented on pull request #11014:
URL: https://github.com/apache/arrow/pull/11014#issuecomment-908595122


   > What would be a use case? (I don't see any test where it is actually used for something) I suppose as alternative for #11008, right?
   
   Yes, but not as an alternative, this would be to enable #11008.  I don't believe #11008 works today.  `ds.partitioning(field_names=['a'])` returns a partitioning factory and not a partitioning.  It also returns an error when specifying the hive format.  A partitioning factory cannot be used for writing datasets (only reading)...
   
   ```
   >>> import pyarrow
   >>> import pyarrow as pa
   >>> import pyarrow.dataset as ds
   >>> table = pa.Table.from_pydict({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
   >>> ds.partitioning(field_names=['a'])
   <pyarrow._dataset.PartitioningFactory object at 0x7f2410d4c170>
   >>> part = ds.partitioning(field_names=['a'])
   >>> ds.write_dataset(table, '/tmp/new_dataset', format='parquet', partitioning=part)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/home/pace/anaconda3/envs/arrow-release-5/lib/python3.9/site-packages/pyarrow/dataset.py", line 791, in write_dataset
       partitioning = _ensure_write_partitioning(partitioning)
     File "/home/pace/anaconda3/envs/arrow-release-5/lib/python3.9/site-packages/pyarrow/dataset.py", line 686, in _ensure_write_partitioning
       raise ValueError("partitioning needs to be actual Partitioning object")
   ValueError: partitioning needs to be actual Partitioning object
   ```
   
   I didn't add real world use cases (or python bindings) because I figured those would be covered by #11008.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org