You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/22 15:37:05 UTC
[GitHub] [arrow] pitrou commented on pull request #13938: ARROW-17388: [C++][Python] Error on WriteTable if duplicate field names
pitrou commented on PR #13938:
URL: https://github.com/apache/arrow/pull/13938#issuecomment-1222530533
So, it seems this is a capability that should be preserved. The problem is the new dataset implementation doesn't allow reading the file back:
```python
>>> pq.read_table('file.parquet', use_legacy_dataset=False)
Traceback (most recent call last):
[...]
ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: int64
a: int64
__fragment_index: int32
__batch_index: int32
__last_in_fragment: bool
__filename: string
>>> pq.read_table('file.parquet', use_legacy_dataset=True)
<ipython-input-12-6eeebe64658f>:1: FutureWarning: Passing 'use_legacy_dataset=True' to get the legacy behaviour is deprecated as of pyarrow 8.0.0, and the legacy implementation will be removed in a future version.
pq.read_table('file.parquet', use_legacy_dataset=True)
pyarrow.Table
a: int64
a: int64
----
a: [[4,5,6]]
a: [[1,2,3]]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org