You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/08/13 08:47:00 UTC

[jira] [Created] (ARROW-9718) [Python] Make pyarrow.parquet work with the new filesystem interfaces

Joris Van den Bossche created ARROW-9718:
--------------------------------------------

             Summary: [Python] Make pyarrow.parquet work with the new filesystem interfaces
                 Key: ARROW-9718
                 URL: https://issues.apache.org/jira/browse/ARROW-9718
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche


The place internally where the "legacy" `pyarrow.filesystem` filesystems are still used is in the {{pyarrow.parquet}} module.

It is used in:

- ParquetWriter
- ParquetManifest/ParquetDataset
- write_to_dataset

For {{ParquetWriter}}, we need to update this to work with the new filesystems (since ParquetWriter is not dataset related, and thus won't be deprecated).  
For {{ParquetManifest}}/{{ParquetDataset}}, it might not need to be updated, since those might get deprecated itself (to be discussed), and when using the {{use_legacy_dataset=False}} option, it already uses the new datasets.  
For {{write_to_dataset}}, this might depend on how the writing capabilities of the dataset project evolve.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)