You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/22 15:37:05 UTC

[GitHub] [arrow] pitrou commented on pull request #13938: ARROW-17388: [C++][Python] Error on WriteTable if duplicate field names

pitrou commented on PR #13938:
URL: https://github.com/apache/arrow/pull/13938#issuecomment-1222530533

   So, it seems this is a capability that should be preserved. The problem is the new dataset implementation doesn't allow reading the file back:
   ```python
   >>> pq.read_table('file.parquet', use_legacy_dataset=False)
   Traceback (most recent call last):
     [...]
   ArrowInvalid: Multiple matches for FieldRef.Name(a) in a: int64
   a: int64
   __fragment_index: int32
   __batch_index: int32
   __last_in_fragment: bool
   __filename: string
   
   >>> pq.read_table('file.parquet', use_legacy_dataset=True)
   <ipython-input-12-6eeebe64658f>:1: FutureWarning: Passing 'use_legacy_dataset=True' to get the legacy behaviour is deprecated as of pyarrow 8.0.0, and the legacy implementation will be removed in a future version.
     pq.read_table('file.parquet', use_legacy_dataset=True)
   pyarrow.Table
   a: int64
   a: int64
   ----
   a: [[4,5,6]]
   a: [[1,2,3]]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org